Opened 3 years ago

Closed 3 years ago

#1486375 closed Bugs (fixed)

Character set conversion fails on systems where iconv doesn't accept //IGNORE

Reported by: trisk Owned by:
Priority: 4 Milestone: 0.4-beta
Component: Core functionality Version: git-master
Severity: major Keywords:
Cc:

Description

Because of the change to prevent strings from being truncated in #1484420, RoundCube may be unable to perform character set conversion in some cases if mbstring is not also available.

This is accompanied by the message "PHP Error: Could not convert string from GBK to UTF-8. Make sure iconv/mbstring is installed or lib/utf8.class is available"

The "TRANSLIT" and "IGNORE" modifiers to the encoding name are specific to GNU libiconv and result an unrecognised encoding name when used with other iconv implementations.

A workaround may be to test if "IGNORE" is supported before using it.

Attachments (1)

roundcube-iconv.diff (1.6 KB) - added by trisk 3 years ago.
Patch to use TRANSLIT instead of IGNORE and handle illegal input characters

Download all attachments as: .zip

Change History (17)

comment:1 Changed 3 years ago by alec

But if we use iconv without IGNORE text with broken characters will be truncated. You think it would be better?

comment:2 Changed 3 years ago by trisk

Truncating text silently would be terrible indeed. Fortunately, a conversion error without IGNORE normally produces an E_NOTICE error message. To actually take advantage of in the caller to prevent truncation, you can add an error handler for E_NOTICE which throws an ErrorException?.

Caching the result of test to determine if IGNORE is supported (libiconv supports an empty string as an encoding name) is an additional optimisation.

However, http://www.php.net/manual/en/function.iconv.php#90330 warns that even when using IGNORE, iconv may still truncate the output silently. I don't know what to do in that case.

Index: program/include/main.inc
===================================================================
--- program/include/main.inc    (revision 3184)
+++ program/include/main.inc    (working copy)
@@ -170,6 +170,18 @@
 
 
 /**
+ * Catch an error and throw an exception.
+ *
+ * @param  int Level of the error
+ * @param  string Error message
+ */ 
+function rcube_error_handler($errno, $errstr)
+  {
+  throw new ErrorException($errstr, 0, $errno);
+  }
+
+
+/**
  * Convert a string from one charset to another.
  * Uses mbstring and iconv functions if possible
  *
@@ -180,6 +192,7 @@
  */
 function rcube_charset_convert($str, $from, $to=NULL)
   {
+  static $iconv_ignore = null;
   static $mbstring_loaded = null;
   static $mbstring_list = null;
   static $convert_warning = false;
@@ -195,7 +208,19 @@
 
   // convert charset using iconv module  
   if (function_exists('iconv') && $from != 'UTF-7' && $to != 'UTF-7') {
-    $_iconv = iconv($from, $to . '//IGNORE', $str);
+    if (is_null($iconv_ignore)) {
+        $iconv_ignore = '//IGNORE';
+        if (iconv('', $iconv_ignore, '') === false) {
+            $iconv_ignore = '';
+        }
+    }
+    set_error_handler('rcube_error_handler', E_NOTICE);
+    try {
+        $_iconv = iconv($from, $to . $iconv_ignore, $str);
+    } catch (ErrorException $e) {
+        $_iconv = false;
+    }
+    restore_error_handler();
     if ($_iconv !== false) {
         return $_iconv;
     }

comment:3 Changed 3 years ago by trisk

Does the previous patch look okay to you? It works correctly in the case where IGNORE is not supported.

comment:4 Changed 3 years ago by alec

  • Resolution set to fixed
  • Status changed from new to closed

In my opinion is better to have text unconverted than truncated. So, fixed using @ to skip E_NOTICE messages in [ecbd5b5d].

comment:5 Changed 3 years ago by trisk

  • Resolution fixed deleted
  • Status changed from closed to reopened

No, my proposed solution handles the error case correctly so there is no truncation.

comment:6 Changed 3 years ago by alec

Doesn't your iconv truncate string on unknown character? I thought it's standard behaviour when not using IGNORE.

comment:7 follow-up: Changed 3 years ago by trisk

The system iconv() normally returns an error code when conversion fails. We can be notified of that in PHP by setting an error handler for E_NOTICE because the PHP manual for iconv says: 'If you append the string IGNORE, characters which cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.' The case without IGNORE is actually safer because we don't silently discard data which can also lead to truncation if later characters are interpreted with the wrong byte offset.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 3 years ago by alec

Replying to trisk:

Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.' The case without IGNORE is actually safer

I don't think it's safer.

comment:9 in reply to: ↑ 8 Changed 3 years ago by trisk

Replying to alec:

Replying to trisk:

Otherwise, str is cut from the first illegal character and an E_NOTICE is generated.' The case without IGNORE is actually safer

I don't think it's safer.

I'm afraid I don't follow.

Using iconv with IGNORE will silently discard illegal input if it operates as documented. This is the current behaviour when IGNORE is supported and unchanged by the proposed changes.

The proposed changes allow iconv to be used without IGNORE when IGNORE is not supported, and detects conversion errors, falling back to the full unconverted string. This is strictly an improvement over previous behaviour provided that the error reporting works as documented.

comment:10 Changed 3 years ago by alec

Sorry, I didn't read your patch carefully. Sounds you're right.

comment:11 Changed 3 years ago by trisk

Thanks, I had to make sure I wasn't missing something.

#1484420 introduced IGNORE because conversion errors when iconv was used without IGNORE weren't handled correctly leading to truncation. Since an error handler is added for iconv, that should no longer be a problem.

If reverting to the uncoverted string when there is a conversion error is preferable to omitting unconvertable characters it might be reasonable to drop IGNORE altogether. Another option is to replace it with TRANSLIT, another libiconv extension, which tries to map unconvertable characters to similar characters in the output charset, but I don't know how well this works in practice.

I've only done minimal testing, so someone with GNU libiconv and more extensive testcases should verify that the iconv error handling works correctly.

Changed 3 years ago by trisk

Patch to use TRANSLIT instead of IGNORE and handle illegal input characters

comment:12 Changed 3 years ago by trisk

  • Priority changed from 5 to 4
  • Version changed from 0.3.1 to svn-trunk

http://bugs.php.net/bug.php?id=48147 has test results which confirm that IGNORE can truncate strings, and suggest TRANSLIT may be a better alternative.

The TRANSLIT option (where supported) approximates characters which are not available in the output charset with their nearest equivalents. http://bugs.php.net/bug.php?id=38425 shows iconv with TRANSLIT will still throw errors if it encounters a character which is illegal in the input charset, so we can use TRANSLIT together with the error handling to fall back to the unconverted string if an illegal character appears in the input.

I've added an updated patch using TRANSLIT instead of IGNORE.

comment:13 Changed 3 years ago by alec

In you patch using TRANSLIT and detection of its support looks ok, but I think, we shouldn't fallback to the unconverted string if TRANSLIT was used. We should fallback only when iconv_options=, because this means that string will be truncated on illegal character. String is not truncated when TRANSLIT (or IGNORE) is used, so we can use this iconv result.

comment:14 Changed 3 years ago by alec

What I mean, we want to prevent from truncated text, but not truncated text with illegal characters is ok.

comment:15 Changed 3 years ago by alec

Arghhh. Again I'm misleading you. A though that on illegal character E_NOTICE is thrown also when TRANSLIT/IGNORE is used. It's not true. So, your patch now looks OK and I will commit it soon. Sorry for the noise.

comment:16 Changed 3 years ago by alec

  • Resolution set to fixed
  • Status changed from reopened to closed

Applied in [f52e7a00].

Note: See TracTickets for help on using tickets.