Opened 5 years ago

Closed 5 years ago

#1485178 closed Bugs (fixed)

ISO-8859-* HTML messages with <title> before <meta> converted twice to UTF8

Reported by: Saiph Owned by:
Priority: 5 Milestone: 0.2-beta
Component: Client Scripts Version: 0.2-alpha
Severity: normal Keywords: encoding convert twice utf html
Cc:

Description

I've reduced this problem to the sequence of <meta http-equiv="Content-Type"> and <title> tags.

This message prints OK:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
<title>Pchnąć w tę łódź jeża lub ośm skrzyń fig</title>
</head>

But if you put <title> before <meta> tag, then message boody gets converted twice to UTF-8, ie. all non-ASCII chars are converted to UTF8, and the result of this convertion is converted once again to UTF8, with ISO charset assumed as input encoding. So in the end you get each single-byte ISO non-ASCII character encoded on 4 bytes, which are displayed as 2 garbage UTF8 chars.

Interestingly, if the <title> tag is made of ASCII-only characters, then its position doesn't matter.

BTW, the contents of <title> tag aren't shown anywhere, couldn't it be moved to the message body somehow?

PS. This bug appeared around r1484.

Attachments (1)

html_encoding_test.msg (552 bytes) - added by Saiph 5 years ago.
A message triggering the double convertion to UTF8 bug.

Download all attachments as: .zip

Change History (3)

Changed 5 years ago by Saiph

A message triggering the double convertion to UTF8 bug.

comment:1 Changed 5 years ago by alec

  • Milestone changed from later to 0.2-beta

comment:2 Changed 5 years ago by alec

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in [c1b81f57]. Known problem described in http://bugs.php.net/bug.php?id=32547

Note: See TracTickets for help on using tickets.