Opened 5 years ago
Closed 5 years ago
#1485519 closed Bugs (fixed)
Missing entities with plaintext messages using html2text
| Reported by: | larppaxyz | Owned by: | |
|---|---|---|---|
| Priority: | 5 | Milestone: | 0.2-stable |
| Component: | PHP backend | Version: | 0.2-beta |
| Severity: | major | Keywords: | HTML skandinavian entities |
| Cc: |
Description
Plaintext messages created from HTML messages are missing important entities. Entities like ä (=ä) and ö (=ö) are used in skandinavian countries.
By default, html2text just drops out those charters.
Quick and dirty fix follows :
Comment out two lines containing ' Unknown/unhandled entities' in program/lib/html2text.php.
Change History (4)
comment:1 Changed 5 years ago by robin
comment:2 Changed 5 years ago by alec
- Component changed from Client Scripts to PHP backend
- Milestone changed from later to 0.2-stable
comment:3 Changed 5 years ago by alec
Would be better to replace all known entities using get_html_translation_table(HTML_ENTITIES), see comments in php manual.
comment:4 Changed 5 years ago by alec
- Resolution set to fixed
- Status changed from new to closed
Fixed in [300fc65a].
Note: See
TracTickets for help on using
tickets.

How is this for a start:
Index: program/lib/html2text.php =================================================================== --- program/lib/html2text.php (revision 2005) +++ program/lib/html2text.php (working copy) @@ -183,6 +183,8 @@ '/&(bull|#149|#8226);/i', // Bullet '/&(pound|#163);/i', // Pound sign '/&(euro|#8364);/i', // Euro sign + '/&([aceiou](grave|acute|circ|tilde|uml|ring));/ie', + // Other thingies '/&[^&;]+;/i', // Unknown/unhandled entities '/[ ]{2,}/' // Runs of spaces, post-handling ); @@ -234,6 +236,7 @@ '*', '£', 'EUR', // Euro sign. ? + 'html_entity_decode("&\\1;")', // Other thingies '', // Unknown/unhandled entities ' ' // Runs of spaces, post-handling );I don't know what to call this set of characters so I used 'other thingies'...