Opened 5 years ago
Closed 5 years ago
#1485178 closed Bugs (fixed)
ISO-8859-* HTML messages with <title> before <meta> converted twice to UTF8
| Reported by: | Saiph | Owned by: | |
|---|---|---|---|
| Priority: | 5 | Milestone: | 0.2-beta |
| Component: | Client Scripts | Version: | 0.2-alpha |
| Severity: | normal | Keywords: | encoding convert twice utf html |
| Cc: |
Description
I've reduced this problem to the sequence of <meta http-equiv="Content-Type"> and <title> tags.
This message prints OK:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
<title>Pchnąć w tę łódź jeża lub ośm skrzyń fig</title>
</head>
But if you put <title> before <meta> tag, then message boody gets converted twice to UTF-8, ie. all non-ASCII chars are converted to UTF8, and the result of this convertion is converted once again to UTF8, with ISO charset assumed as input encoding. So in the end you get each single-byte ISO non-ASCII character encoded on 4 bytes, which are displayed as 2 garbage UTF8 chars.
Interestingly, if the <title> tag is made of ASCII-only characters, then its position doesn't matter.
BTW, the contents of <title> tag aren't shown anywhere, couldn't it be moved to the message body somehow?
PS. This bug appeared around r1484.
Attachments (1)
Change History (3)
Changed 5 years ago by Saiph
comment:1 Changed 5 years ago by alec
- Milestone changed from later to 0.2-beta
comment:2 Changed 5 years ago by alec
- Resolution set to fixed
- Status changed from new to closed
Fixed in [c1b81f57]. Known problem described in http://bugs.php.net/bug.php?id=32547

A message triggering the double convertion to UTF8 bug.