Opened 7 years ago
Closed 6 years ago
#1484119 closed Bugs (fixed)
Charset bug in contacts
| Reported by: | b@… | Owned by: | thomasb |
|---|---|---|---|
| Priority: | 5 | Milestone: | 0.1-rc1 |
| Component: | Client Scripts | Version: | git-master |
| Severity: | normal | Keywords: | |
| Cc: |
Description
When I add name of person, whom send me a message.
In screen, where I can see contents of the message I press + to add person to my contacts, but in my contacts book it looks like http://forum.roundcube.ru/index.php?act=Attach&type=post&id=16 (first line)
Change History (6)
comment:1 Changed 7 years ago by b@…
- Summary changed from Chaset bug in contacts to Charset bug in contacts
comment:2 Changed 7 years ago by tcat
- Version changed from 0.1-beta2 to svn-trunk
comment:3 Changed 6 years ago by thomasb
- Milestone set to 0.1-rc1
- Owner set to thomasb
- Status changed from new to assigned
Related: #1484217 (with Screenshots)
comment:4 Changed 6 years ago by mattenklicker
I added in program/steps/mail/addcontact.inc after line 29 ($contact = $contact_arr[1];):
if ($contact['name'])
{
$contact['name']=utf8_decode($contact['name']);
}
This works for me.
comment:5 Changed 6 years ago by thomasb
Related bug: #1484329
comment:6 Changed 6 years ago by thomasb
- Resolution set to fixed
- Status changed from assigned to closed
Fixed in trunk ([f1154163])
Note: See
TracTickets for help on using
tickets.

Yes! I have same problem with hungarian characters. I think the problem is in the rcube_imap.inc. The php can\'t handle unicode double-byte characters (not shure, but you can compile 5.2. with full unicode support).
str_replace, strpos, stripslashes, preg_replace (used many times in rcube_imap.inc / _parse_address_list(), decode_address_list() and called from steps/mail/addcontact.inc) can split multibyte characters. Maybe you can use custom function instead of built in.
I write some example:
/** * Count the amount of characters in a UTF-8 string. This is less than or * equal to the byte count. */ function unicode_strlen($text) { if (function_exists('mb_strlen')) { return mb_strlen($text); } else { // Do not count UTF-8 continuation bytes. return strlen(preg_replace("/[\x80-\xBF]/", '', $text)); } } /** * Cut off a piece of a string based on character indices and counts. Follows * the same behaviour as PHP's own substr() function. * * Note that for cutting off a string at a known character/substring * location, the usage of PHP's normal strpos/substr is safe and * much faster. */ function unicode_substr($text, $start, $length = NULL) { if (function_exists('mb_substr')) { return $length === NULL ? mb_substr($text, $start) : mb_substr($text, $start, $length); } else { $strlen = strlen($text); // Find the starting byte offset if ($start > 0) { // Count all the continuation bytes from the start until we have found // $start characters $bytes = -1; $chars = -1; while ($bytes < $strlen && $chars < $start) { $bytes++; $c = ord($text[$bytes]); if ($c < 0x80 || $c >= 0xC0) { $chars++; } } } else if ($start < 0) { // Count all the continuation bytes from the end until we have found // abs($start) characters $start = abs($start); $bytes = $strlen; $chars = 0; while ($bytes > 0 && $chars < $start) { $bytes--; $c = ord($text[$bytes]); if ($c < 0x80 || $c >= 0xC0) { $chars++; } } } $istart = $bytes; // Find the ending byte offset if ($length === NULL) { $bytes = $strlen - 1; } else if ($length > 0) { // Count all the continuation bytes from the starting index until we have // found $length + 1 characters. Then backtrack one byte. $bytes = $istart; $chars = 0; while ($bytes < $strlen && $chars < $length) { $bytes++; $c = ord($text[$bytes]); if ($c < 0x80 || $c >= 0xC0) { $chars++; } } $bytes--; } else if ($length < 0) { // Count all the continuation bytes from the end until we have found // abs($length) characters $length = abs($length); $bytes = $strlen - 1; $chars = 0; while ($bytes >= 0 && $chars < $length) { $c = ord($text[$bytes]); if ($c < 0x80 || $c >= 0xC0) { $chars++; } $bytes--; } } $iend = $bytes; return substr($text, $istart, max(0, $iend - $istart + 1)); } } /** * Decode all HTML entities (including numerical ones) to regular UTF-8 bytes. * Double-escaped entities will only be decoded once ("&lt;" becomes "<", not "<"). * * @param $text * The text to decode entities in. * @param $exclude * An array of characters which should not be decoded. For example, * array('<', '&', '"'). This affects both named and numerical entities. */ function decode_entities($text, $exclude = array()) { static $table; // We store named entities in a table for quick processing. if (!isset($table)) { // Get all named HTML entities. $table = array_flip(get_html_translation_table(HTML_ENTITIES)); // PHP gives us ISO-8859-1 data, we need UTF-8. $table = array_map('utf8_encode', $table); // Add apostrophe (XML) $table['''] = "'"; } $newtable = array_diff($table, $exclude); // Use a regexp to select all entities in one pass, to avoid decoding double-escaped entities twice. return preg_replace('/&(#x?)?([A-Za-z0-9]+);/e', '_decode_entities("$1", "$2", "$0", $newtable, $exclude)', $text); } /** * Helper function for decode_entities */ function _decode_entities($prefix, $codepoint, $original, &$table, &$exclude) { // Named entity if (!$prefix) { if (isset($table[$original])) { return $table[$original]; } else { return $original; } } // Hexadecimal numerical entity if ($prefix == '#x') { $codepoint = base_convert($codepoint, 16, 10); } // Decimal numerical entity (strip leading zeros to avoid PHP octal notation) else { $codepoint = preg_replace('/^0+/', '', $codepoint); } // Encode codepoint as UTF-8 bytes if ($codepoint < 0x80) { $str = chr($codepoint); } else if ($codepoint < 0x800) { $str = chr(0xC0 | ($codepoint >> 6)) . chr(0x80 | ($codepoint & 0x3F)); } else if ($codepoint < 0x10000) { $str = chr(0xE0 | ( $codepoint >> 12)) . chr(0x80 | (($codepoint >> 6) & 0x3F)) . chr(0x80 | ( $codepoint & 0x3F)); } else if ($codepoint < 0x200000) { $str = chr(0xF0 | ( $codepoint >> 18)) . chr(0x80 | (($codepoint >> 12) & 0x3F)) . chr(0x80 | (($codepoint >> 6) & 0x3F)) . chr(0x80 | ( $codepoint & 0x3F)); } // Check for excluded characters if (in_array($str, $exclude)) { return $original; } else { return $str; } }Changing the built in functions can fix your problem, but whithout using mbstring extension, these functions will run much slower than built in.