Opened 5 years ago
Closed 5 years ago
#1485467 closed Feature Patches (invalid)
Patch for support big5 charset in rc_detect_encoding()
| Reported by: | dwj | Owned by: | |
|---|---|---|---|
| Priority: | 5 | Milestone: | 0.2-stable |
| Component: | Core functionality | Version: | 0.2-beta |
| Severity: | normal | Keywords: | |
| Cc: |
Description
mb_detect_encoding() do not use 'BIG5' but 'BIG-5'.
Use mb_list_encodings() list mbstring module supported encodings:
array(64) {
[0]=>
string(4) "pass"
[1]=>
string(4) "auto"
[2]=>
string(5) "wchar"
[3]=>
string(7) "byte2be"
[4]=>
string(7) "byte2le"
[5]=>
string(7) "byte4be"
[6]=>
string(7) "byte4le"
[7]=>
string(6) "BASE64"
[8]=>
string(8) "UUENCODE"
[9]=>
string(13) "HTML-ENTITIES"
[10]=>
string(16) "Quoted-Printable"
[11]=>
string(4) "7bit"
[12]=>
string(4) "8bit"
[13]=>
string(5) "UCS-4"
[14]=>
string(7) "UCS-4BE"
[15]=>
string(7) "UCS-4LE"
[16]=>
string(5) "UCS-2"
[17]=>
string(7) "UCS-2BE"
[18]=>
string(7) "UCS-2LE"
[19]=>
string(6) "UTF-32"
[20]=>
string(8) "UTF-32BE"
[21]=>
string(8) "UTF-32LE"
[22]=>
string(6) "UTF-16"
[23]=>
string(8) "UTF-16BE"
[24]=>
string(8) "UTF-16LE"
[25]=>
string(5) "UTF-8"
[26]=>
string(5) "UTF-7"
[27]=>
string(9) "UTF7-IMAP"
[28]=>
string(5) "ASCII"
[29]=>
string(6) "EUC-JP"
[30]=>
string(4) "SJIS"
[31]=>
string(9) "eucJP-win"
[32]=>
string(8) "SJIS-win"
[33]=>
string(7) "CP51932"
[34]=>
string(3) "JIS"
[35]=>
string(11) "ISO-2022-JP"
[36]=>
string(14) "ISO-2022-JP-MS"
[37]=>
string(12) "Windows-1252"
[38]=>
string(10) "ISO-8859-1"
[39]=>
string(10) "ISO-8859-2"
[40]=>
string(10) "ISO-8859-3"
[41]=>
string(10) "ISO-8859-4"
[42]=>
string(10) "ISO-8859-5"
[43]=>
string(10) "ISO-8859-6"
[44]=>
string(10) "ISO-8859-7"
[45]=>
string(10) "ISO-8859-8"
[46]=>
string(10) "ISO-8859-9"
[47]=>
string(11) "ISO-8859-10"
[48]=>
string(11) "ISO-8859-13"
[49]=>
string(11) "ISO-8859-14"
[50]=>
string(11) "ISO-8859-15"
[51]=>
string(11) "ISO-8859-16"
[52]=>
string(6) "EUC-CN"
[53]=>
string(5) "CP936"
[54]=>
string(2) "HZ"
[55]=>
string(6) "EUC-TW"
[56]=>
string(5) "BIG-5"
[57]=>
string(6) "EUC-KR"
[58]=>
string(3) "UHC"
[59]=>
string(11) "ISO-2022-KR"
[60]=>
string(12) "Windows-1251"
[61]=>
string(5) "CP866"
[62]=>
string(6) "KOI8-R"
[63]=>
string(9) "ArmSCII-8"
}
patch file:
--- rcube_shared.inc 2008-10-06 22:18:49.000000000 +0800
+++ rcube_shared.inc.new 2008-10-06 22:19:40.000000000 +0800
@@ -548,7 +548,7 @@
// FIXME: the order is important, because sometimes
// iso string is detected as euc-jp and etc.
$enc = array(
- 'SJIS', 'BIG5', 'GB2312', 'UTF-8',
+ 'SJIS', 'BIG-5', 'GB2312', 'UTF-8',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4',
'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9',
'ISO-8859-10', 'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
Change History (4)
comment:1 Changed 5 years ago by tensor
- Milestone changed from later to 0.2-stable
comment:2 Changed 5 years ago by tensor
For example, CP-1251 is an alias for Windows-1251 for mb_detect_encoding(). I grepped the mod_php binary and it has both BIG-5 and BIG5.
strings /usr/lib/apache2/modules/libphp5.so| grep BIG
comment:3 Changed 5 years ago by dwj
After test mb_detect_encoding($str, 'BIG5') and mb_detect_encoding($str, 'BIG-5'),
both return 'BIG-5'. So, 'BIG5' and 'BIG-5' are the same in php mbstring module.
comment:4 Changed 5 years ago by alec
- Resolution set to invalid
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.

I my experience mb_detect_encoding() spits a warning if it gets encoding name it does not understand. Do you have such warnings in your logs? Can you provide screenshots what is wrong when BIG5 is used and what is fixed when BIG-5 is used?