Ticket #1485519 (closed Bugs: fixed)

Opened 3 months ago

Last modified 7 weeks ago

Missing entities with plaintext messages using html2text

Reported by: larppaxyz Owned by:
Priority: 5 Milestone: 0.2-stable
Component: PHP backend Version: 0.2-beta
Severity: major Keywords: HTML skandinavian entities
Cc:

Description

Plaintext messages created from HTML messages are missing important entities. Entities like &auml (=ä) and &ouml (=ö) are used in skandinavian countries.

By default, html2text just drops out those charters.

Quick and dirty fix follows :

Comment out two lines containing '// Unknown/unhandled entities' in program/lib/html2text.php.

Change History

Changed 3 months ago by robin

How is this for a start:

Index: program/lib/html2text.php
===================================================================
--- program/lib/html2text.php   (revision 2005)
+++ program/lib/html2text.php   (working copy)
@@ -183,6 +183,8 @@
         '/&(bull|#149|#8226);/i',                // Bullet
         '/&(pound|#163);/i',                     // Pound sign
         '/&(euro|#8364);/i',                     // Euro sign
+        '/&([aceiou](grave|acute|circ|tilde|uml|ring));/ie',
+                                                 // Other thingies
         '/&[^&;]+;/i',                           // Unknown/unhandled entities
         '/[ ]{2,}/'                              // Runs of spaces, post-handling
     );
@@ -234,6 +236,7 @@
         '*',
         '£',
         'EUR',                                  // Euro sign.  ?
+        'html_entity_decode("&\\1;")',          // Other thingies
         '',                                     // Unknown/unhandled entities
         ' '                                     // Runs of spaces, post-handling
     );

I don't know what to call this set of characters so I used 'other thingies'...

Changed 3 months ago by alec

  • component changed from Client Scripts to PHP backend
  • milestone changed from later to 0.2-stable

Changed 3 months ago by alec

Would be better to replace all known entities using get_html_translation_table(HTML_ENTITIES), see comments in php manual.

Changed 7 weeks ago by alec

  • status changed from new to closed
  • resolution set to fixed

Fixed in r2070.

Note: See TracTickets for help on using tickets.