Opened 5 years ago

Closed 5 years ago

#1485519 closed Bugs (fixed)

Missing entities with plaintext messages using html2text

Reported by: larppaxyz Owned by:
Priority: 5 Milestone: 0.2-stable
Component: PHP backend Version: 0.2-beta
Severity: major Keywords: HTML skandinavian entities
Cc:

Description

Plaintext messages created from HTML messages are missing important entities. Entities like &auml (=ä) and &ouml (=ö) are used in skandinavian countries.

By default, html2text just drops out those charters.

Quick and dirty fix follows :

Comment out two lines containing ' Unknown/unhandled entities' in program/lib/html2text.php.

Change History (4)

comment:1 Changed 5 years ago by robin

How is this for a start:

Index: program/lib/html2text.php
===================================================================
--- program/lib/html2text.php   (revision 2005)
+++ program/lib/html2text.php   (working copy)
@@ -183,6 +183,8 @@
         '/&(bull|#149|#8226);/i',                // Bullet
         '/&(pound|#163);/i',                     // Pound sign
         '/&(euro|#8364);/i',                     // Euro sign
+        '/&([aceiou](grave|acute|circ|tilde|uml|ring));/ie',
+                                                 // Other thingies
         '/&[^&;]+;/i',                           // Unknown/unhandled entities
         '/[ ]{2,}/'                              // Runs of spaces, post-handling
     );
@@ -234,6 +236,7 @@
         '*',
         '£',
         'EUR',                                  // Euro sign.  ?
+        'html_entity_decode("&\\1;")',          // Other thingies
         '',                                     // Unknown/unhandled entities
         ' '                                     // Runs of spaces, post-handling
     );

I don't know what to call this set of characters so I used 'other thingies'...

comment:2 Changed 5 years ago by alec

  • Component changed from Client Scripts to PHP backend
  • Milestone changed from later to 0.2-stable

comment:3 Changed 5 years ago by alec

Would be better to replace all known entities using get_html_translation_table(HTML_ENTITIES), see comments in php manual.

comment:4 Changed 5 years ago by alec

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in [300fc65a].

Note: See TracTickets for help on using tickets.