Opened 5 years ago

Closed 5 years ago

#1485467 closed Feature Patches (invalid)

Patch for support big5 charset in rc_detect_encoding()

Reported by: dwj Owned by:
Priority: 5 Milestone: 0.2-stable
Component: Core functionality Version: 0.2-beta
Severity: normal Keywords:
Cc:

Description

mb_detect_encoding() do not use 'BIG5' but 'BIG-5'.
Use mb_list_encodings() list mbstring module supported encodings:

array(64) {
  [0]=>
  string(4) "pass"
  [1]=>
  string(4) "auto"
  [2]=>
  string(5) "wchar"
  [3]=>
  string(7) "byte2be"
  [4]=>
  string(7) "byte2le"
  [5]=>
  string(7) "byte4be"
  [6]=>
  string(7) "byte4le"
  [7]=>
  string(6) "BASE64"
  [8]=>
  string(8) "UUENCODE"
  [9]=>
  string(13) "HTML-ENTITIES"
  [10]=>
  string(16) "Quoted-Printable"
  [11]=>
  string(4) "7bit"
  [12]=>
  string(4) "8bit"
  [13]=>
  string(5) "UCS-4"
  [14]=>
  string(7) "UCS-4BE"
  [15]=>
  string(7) "UCS-4LE"
  [16]=>
  string(5) "UCS-2"
  [17]=>
  string(7) "UCS-2BE"
  [18]=>
  string(7) "UCS-2LE"
  [19]=>
  string(6) "UTF-32"
  [20]=>
  string(8) "UTF-32BE"
  [21]=>
  string(8) "UTF-32LE"
  [22]=>
  string(6) "UTF-16"
  [23]=>
  string(8) "UTF-16BE"
  [24]=>
  string(8) "UTF-16LE"
  [25]=>
  string(5) "UTF-8"
  [26]=>
  string(5) "UTF-7"
  [27]=>
  string(9) "UTF7-IMAP"
  [28]=>
  string(5) "ASCII"
  [29]=>
  string(6) "EUC-JP"
  [30]=>
  string(4) "SJIS"
  [31]=>
  string(9) "eucJP-win"
  [32]=>
  string(8) "SJIS-win"
  [33]=>
  string(7) "CP51932"
  [34]=>
  string(3) "JIS"
  [35]=>
  string(11) "ISO-2022-JP"
  [36]=>
  string(14) "ISO-2022-JP-MS"
  [37]=>
  string(12) "Windows-1252"
  [38]=>
  string(10) "ISO-8859-1"
  [39]=>
  string(10) "ISO-8859-2"
  [40]=>
  string(10) "ISO-8859-3"
  [41]=>
  string(10) "ISO-8859-4"
  [42]=>
  string(10) "ISO-8859-5"
  [43]=>
  string(10) "ISO-8859-6"
  [44]=>
  string(10) "ISO-8859-7"
  [45]=>
  string(10) "ISO-8859-8"
  [46]=>
  string(10) "ISO-8859-9"
  [47]=>
  string(11) "ISO-8859-10"
  [48]=>
  string(11) "ISO-8859-13"
  [49]=>
  string(11) "ISO-8859-14"
  [50]=>
  string(11) "ISO-8859-15"
  [51]=>
  string(11) "ISO-8859-16"
  [52]=>
  string(6) "EUC-CN"
  [53]=>
  string(5) "CP936"
  [54]=>
  string(2) "HZ"
  [55]=>
  string(6) "EUC-TW"
  [56]=>
  string(5) "BIG-5"
  [57]=>
  string(6) "EUC-KR"
  [58]=>
  string(3) "UHC"
  [59]=>
  string(11) "ISO-2022-KR"
  [60]=>
  string(12) "Windows-1251"
  [61]=>
  string(5) "CP866"
  [62]=>
  string(6) "KOI8-R"
  [63]=>
  string(9) "ArmSCII-8"
}

patch file:

--- rcube_shared.inc    2008-10-06 22:18:49.000000000 +0800
+++ rcube_shared.inc.new        2008-10-06 22:19:40.000000000 +0800
@@ -548,7 +548,7 @@
     // FIXME: the order is important, because sometimes
     // iso string is detected as euc-jp and etc.
     $enc = array(
-       'SJIS', 'BIG5', 'GB2312', 'UTF-8',
+       'SJIS', 'BIG-5', 'GB2312', 'UTF-8',
        'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4',
        'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9',
        'ISO-8859-10', 'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',

Change History (4)

comment:1 Changed 5 years ago by tensor

  • Milestone changed from later to 0.2-stable

I my experience mb_detect_encoding() spits a warning if it gets encoding name it does not understand. Do you have such warnings in your logs? Can you provide screenshots what is wrong when BIG5 is used and what is fixed when BIG-5 is used?

comment:2 Changed 5 years ago by tensor

For example, CP-1251 is an alias for Windows-1251 for mb_detect_encoding(). I grepped the mod_php binary and it has both BIG-5 and BIG5.

strings /usr/lib/apache2/modules/libphp5.so| grep BIG

comment:3 Changed 5 years ago by dwj

After test mb_detect_encoding($str, 'BIG5') and mb_detect_encoding($str, 'BIG-5'),
both return 'BIG-5'. So, 'BIG5' and 'BIG-5' are the same in php mbstring module.

comment:4 Changed 5 years ago by alec

  • Resolution set to invalid
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.