Opened 10 months ago
Closed 10 months ago
#1488574 closed Bugs (wontfix)
Some HTML emails showing up as Chinese charaters - UTF-16 issue
| Reported by: | notserpmh | Owned by: | |
|---|---|---|---|
| Priority: | 2 | Milestone: | later |
| Component: | MIME parsing | Version: | 0.7.2 |
| Severity: | normal | Keywords: | |
| Cc: |
Description
We have been using Roundcube for quite some time (since around .2). We are currently on .7.2 stable. We are using Dovecot IMAP as our backend on a Ubuntu 12.04 server, with Apache.
Recently, HTML emails from Zillow.com (a real estate site) are showing up as Chinese characters (if you open it in Chrome, it even pops up an offer to translate). Other HTML emails work fine. If I force the users settings to not display HTML emails, they show in plain text fine. If you open the same email in the same account in Thunderbird (using IMAP) it displays fine (in HTML). I haven't found in the source what is going on.
I tried on our backup machine testing .8 RC, but it has the same bug.
I have attached the message source and a screenshot of what appears to the user (in all browsers).
Change History (3)
comment:1 Changed 10 months ago by notserpmh
- Summary changed from Some HTML emails showing up as Chinese charaters to Some HTML emails showing up as Chinese charaters - UTF-16 issue
comment:2 Changed 10 months ago by notserpmh
Another update. I found the issue. Basically, the HTML in the email says it is UTF-16 encoded, but they don't have a byte-order mark (BOM) and never specify if it is big endian or little endian. Other programs (Thunderbird for example) work around this, but technically, the email is wrong.
I'm not sure if this is something that still could be fixed/worked around or if since the email doesn't meet spec it is a "wont fix". Anyway, I'll leave it up for now and let the devs decide.
If you add FE FF right before the <!DOCTYPE> line in the HTML portion of the email to specify that it is big endian, then the email renders correctly.
Here are some links to BOM if anyone needs them:
http://www.w3.org/International/questions/qa-html-encoding-declarations#utf16
http://www.w3.org/International/questions/qa-byte-order-mark
Thanks for taking the time to look
comment:3 Changed 10 months ago by alec
- Component changed from Core functionality to MIME parsing
- Resolution set to wontfix
- Status changed from new to closed
Yes, the message is malformed. Content-Type header doesn't specify charset (actually it's set to us-ascii). In such case, for html message parts, Roundcube gets charset from meta tag (first occurence). The code is in rcube_imap::get_message_part(). The BOM fix works with mbstring but not with iconv.
The only solution would be to use some charset detector like in Thunderbird, but for now it's a "wont fix".


I just found by playing around with the message source that in the HTML portion, it has two character encoding meta tags, UTF-16 and iso-8859-1. If I remove the UTF-16 tag from the email, it displays fine.
I have asked Zillow if they can remove that tag, but I don't know if they can/will just for us. Any ideas for a work around for this?
(I put it on pastebin, but now it seems to have lost that comment, here is it again: http://pastebin.com/VkbfjVbV)