ISO-8859-* HTML messages with <title> before <meta> converted twice to UTF8
|Reported by:||Saiph||Owned by:|
|Severity:||normal||Keywords:||encoding convert twice utf html|
I've reduced this problem to the sequence of <meta http-equiv="Content-Type"> and <title> tags.
This message prints OK:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
<title>Pchnąć w tę łódź jeża lub ośm skrzyń fig</title>
But if you put <title> before <meta> tag, then message boody gets converted twice to UTF-8, ie. all non-ASCII chars are converted to UTF8, and the result of this convertion is converted once again to UTF8, with ISO charset assumed as input encoding. So in the end you get each single-byte ISO non-ASCII character encoded on 4 bytes, which are displayed as 2 garbage UTF8 chars.
Interestingly, if the <title> tag is made of ASCII-only characters, then its position doesn't matter.
BTW, the contents of <title> tag aren't shown anywhere, couldn't it be moved to the message body somehow?
PS. This bug appeared around r1484.