Ticket #1485178 (closed Bugs: fixed)

Opened 3 months ago

Last modified 3 months ago

ISO-8859-* HTML messages with <title> before <meta> converted twice to UTF8

Reported by: Saiph Owned by:
Priority: 5 Milestone: 0.2-beta
Component: Client Scripts Version: 0.2-alpha
Severity: normal Keywords: encoding convert twice utf html
Cc:

Description

I've reduced this problem to the sequence of <meta http-equiv="Content-Type"> and <title> tags.

This message prints OK: <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> <title>Pchnąć w tę łódź jeża lub ośm skrzyń fig</title> </head>

But if you put <title> before <meta> tag, then message boody gets converted twice to UTF-8, ie. all non-ASCII chars are converted to UTF8, and the result of this convertion is converted once again to UTF8, with ISO charset assumed as input encoding. So in the end you get each single-byte ISO non-ASCII character encoded on 4 bytes, which are displayed as 2 garbage UTF8 chars.

Interestingly, if the <title> tag is made of ASCII-only characters, then its position doesn't matter.

BTW, the contents of <title> tag aren't shown anywhere, couldn't it be moved to the message body somehow?

PS. This bug appeared around r1484.

Attachments

html_encoding_test.msg (0.5 kB) - added by Saiph 3 months ago.
A message triggering the double convertion to UTF8 bug.

Change History

Changed 3 months ago by Saiph

A message triggering the double convertion to UTF8 bug.

Changed 3 months ago by alec

  • milestone changed from later to 0.2-beta

Changed 3 months ago by alec

  • status changed from new to closed
  • resolution set to fixed

Fixed in r1606. Known problem described in http://bugs.php.net/bug.php?id=32547

Note: See TracTickets for help on using tickets.