#1484961 closed Tasks (fixed)
Sender and Receiver charset are ignored
| Reported by: | joungkyun | Owned by: | |
|---|---|---|---|
| Priority: | 10 - Lowest | Milestone: | 0.3-stable |
| Component: | Other | Version: | git-master |
| Severity: | minor | Keywords: | charset |
| Cc: |
Description
If there is no charset in mail herder or multi part header, WU-imapd returns US-ASCII (call by iil_C_FetchStructureString function). So, charset is not null(US-ASCII) and default_charset configration of main.inc.php is ignored.
So, fix this problem attached patch file and row mail data. This patch is applied SVN revision 1246
Attachments (16)
Change History (56)
Changed 5 years ago by joungkyun
comment:1 Changed 5 years ago by thomasb
- Milestone changed from 0.1.1 to later
- Severity changed from major to normal
In case the default charset is ISO-8859-1 or UTF-8 this could work since US-ASCII is a subset of them. But we should not in general replace all US-ASCII messages with the default charset. Unfortunately we cannot see whether the message was really sent with US-ASCII or if this was added by the IMAP server.
comment:2 Changed 5 years ago by joungkyun
In korea, use charset euc-kr or utf-8. (euc-kr is used widely than utf-8).
If charset of mail header or multi part header has no charset as attached raw mail data, iil_C_FetchStructureString function call IMAP command for get charset and any information, and then, imap server returns US-ASCII charset. (I used imap-2006f)
If this case, charset of roundcube set US-ASCII, and roundcube convert US-ASCII to UTF-8. (I configured default charset to euc-kr in main.inc.php). So, print ugly characters, and I can't understand this mail.
If there is no charset, I expect that convert from default charset (euc-kr) to utf-8, but actually roundcube convert from US-ASCII(ISO-8859-1) to UTF-8. Many korean users have this situations. This problem is sender's wrong, but Many e-mail is sent with this case. So, plez add exception case for this problem with my patch.
comment:3 follow-up: ↓ 4 Changed 5 years ago by thomasb
- Resolution set to wontfix
- Status changed from new to closed
Again, RoundCube has to trust whatever the IMAP server responds. The default charset is only used if there's NO charset specified by the IMAP server. In your case there is one.
comment:4 in reply to: ↑ 3 Changed 5 years ago by joungkyun
Replying to thomasb:
Again, RoundCube has to trust whatever the IMAP server responds. The default charset is only used if there's NO charset specified by the IMAP server. In your case there is one.
I don't understand about your answer. Imap server give wrong charset when mail has not charset, and mail is broken.
Anyway, I see, this patch is useless for you, but many multi byte mail is still broken on Roundcube mail. Thanks.
comment:5 Changed 5 years ago by alec
- Milestone changed from later to 0.1.2
comment:6 follow-up: ↓ 8 Changed 5 years ago by tensor
- Milestone changed from 0.2-alpha to 0.2-stable
In trunk it appears to be fixed for message body.
I tested with default_charset of EUC-KR. joungkyun, see the screenshot and tell us whether the text it is a Korean.
There is an issue, though, the subject and sender are not decoded properly for message list.
comment:7 Changed 5 years ago by tensor
- Resolution wontfix deleted
- Status changed from closed to reopened
- Type changed from Patches to Bugs
Changed 5 years ago by tensor
Changed 5 years ago by tensor
comment:8 in reply to: ↑ 6 Changed 5 years ago by joungkyun
I tested with default_charset of EUC-KR. joungkyun, see the screenshot and tell us whether the text it is a Korean.
There is an issue, though, the subject and sender are not decoded properly for message list.
message body is presented good korean. But list is broken. It's will need other path :), and attach patch file. Sorry, my patch for 0.1.1 and some hours ago, I will attach patch for 0.2.
Thanks.
comment:9 Changed 5 years ago by tensor
No patch necessary. That was a caching issue on my side or something was fixed in the trunk during the last several days. I haven't merged upstream changes for some time.
Use a tool in #1485434 to reset the cache.
Changed 5 years ago by tensor
comment:10 Changed 5 years ago by alec
- Resolution set to worksforme
- Status changed from reopened to closed
joungkyun, please check current svn-trunk version. There was added charset detection some time ago and maybe this fixes your issue. Closing as this works for me also.
comment:11 Changed 5 years ago by joungkyun
- Resolution worksforme deleted
- Status changed from closed to reopened
I test with SVN 1941 revision. But, sadly still broken korean string some case and, korean utf8 filename is lost.
I check this problems now, and reporting again. Thanks.
comment:12 Changed 5 years ago by joungkyun
I find 2 problems on 0.2-beta or SVN trunk revision 1941.
One is broken attach file name that is made by RFC2231 encoding. This problem is report to http://trac.roundcube.net/ticket/1485468 and attached patch file.
And, there is another problem.
On mail list, there is mail subject that has CHARSET (Subject: =?B?UTF-8?xxxxxxx=), and next mail subject that has no CHARSET (Subject: 안녕하세요) is broken. This case is attach file 'rcube_0.2_broken_list.jpg'.
So, I attach rcube-svn1941-broken-list.patch
Changed 5 years ago by joungkyun
Changed 5 years ago by joungkyun
Changed 5 years ago by joungkyun
comment:13 follow-up: ↓ 15 Changed 5 years ago by joungkyun
I found more 1 problems.
If mail body has not charset, IMAP server resturns default charset of itself as US-ASCII or X-UNKNOWN. But, Almost mail of country that use CJK (chinese, japanese, korean) charset is not US-ASCII or X-UNKNOWN. So, on CJK environment, almost mail that has not charset header is broken.
So, If default_charset is not ISO-8859-1, need to replace US-ASCII(or X-UNKNOWN) of value that returns iil_C_FetchStructureString function to default_charset.
And, add rcube_0.2_broken_body_with_cjk.patch
comment:14 follow-up: ↓ 34 Changed 5 years ago by tensor
Issue for rcube-svn1941-broken-list.patch confirmed.
comment:15 in reply to: ↑ 13 ; follow-up: ↓ 17 Changed 5 years ago by tensor
Replying to joungkyun:
And, add rcube_0.2_broken_body_with_cjk.patch
Please implement it as an option.
Does this issue occurs for message list or when opening message?
Something like this should go into main.inc.php:
// Some IMAP servers return BODYSTRUCTURE with US-ASCII (IMAP-2006f) // or X-UNKNOWN (IMAP-2007b) charset when no charset is specified in the message. // This setting allows you to force decoding of headers using default_charset. $rcmail_config['override_bodystructure_charset'] = array ( 'X-UNKNOWN', // US-ASCII );
comment:16 follow-up: ↓ 18 Changed 5 years ago by alec
- Component changed from Core functionality to IMAP connection
I think, using default_charset in such case in international environment has no sense. Please, attach whole bodystructure reply from your server.
comment:17 in reply to: ↑ 15 Changed 5 years ago by joungkyun
Replying to tensor:
Replying to joungkyun:
And, add rcube_0.2_broken_body_with_cjk.patch
Please implement it as an option.
Does this issue occurs for message list or when opening message?
This issue occurs both situation. See also broken_body_on_list_page.jpg and broken_body_on_opening_message.jpg.
Something like this should go into main.inc.php:
// Some IMAP servers return BODYSTRUCTURE with US-ASCII (IMAP-2006f) // or X-UNKNOWN (IMAP-2007b) charset when no charset is specified in the message. // This setting allows you to force decoding of headers using default_charset. $rcmail_config['override_bodystructure_charset'] = array ( 'X-UNKNOWN', // US-ASCII );
I think that Good Idea and thanks :-)
Changed 5 years ago by joungkyun
Changed 5 years ago by joungkyun
Changed 5 years ago by joungkyun
comment:18 in reply to: ↑ 16 Changed 5 years ago by joungkyun
Replying to alec:
I think, using default_charset in such case in international environment has no sense. Please, attach whole bodystructure reply from your server.
I already attached row-!pmail-data.txt. And new attach file send 'no_charset_header_on_body_structure.txt'.
row-mail-data has base64 encoding and no_charset_header_on_body_structure.txt has quoted-printable encoding.
If on imap 2006f, imap server returns as follow.
* 288 FETCH (BODYSTRUCTURE (("TEXT" "HTML" ("CHARSET" "US-ASCII") NIL NIL "QUOTED-PRINTABLE" 1302 34 NIL NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY" "246.4C780F__DDC20") NIL NIL NIL)).
F1247 OK FETCH completed.
If on imap 2007b, imap server returns as follow.
* 288 FETCH (BODYSTRUCTURE (("TEXT" "HTML" ("CHARSET" "X-UNKNOWN") NIL NIL "QUOTED-PRINTABLE" 1302 34 NIL NIL NIL NIL) "ALTERNATIVE" ("BOUNDARY" "246.4C780F__DDC20") NIL NIL NIL)).
F1247 OK FETCH completed.
comment:19 follow-ups: ↓ 20 ↓ 21 Changed 5 years ago by tensor
Cannot reproduce for no_charset_header_on_body_structure.txt. It properly picks default_charset of EUC-KR when opening mail even without a patch.
What is your default_charset?
comment:20 in reply to: ↑ 19 Changed 5 years ago by joungkyun
Replying to tensor:
Cannot reproduce for no_charset_header_on_body_structure.txt. It properly picks default_charset of EUC-KR when opening mail even without a patch.
What is your default_charset?
$rcmail_configdefault_charset? = 'EUC-KR';
Changed 5 years ago by joungkyun
comment:21 in reply to: ↑ 19 ; follow-up: ↓ 22 Changed 5 years ago by joungkyun
Replying to tensor:
Cannot reproduce for no_charset_header_on_body_structure.txt. It properly picks default_charset of EUC-KR when opening mail even without a patch.
What is your default_charset?
I attached my main.inc.php.
comment:22 in reply to: ↑ 21 Changed 5 years ago by joungkyun
Replying to joungkyun:
Replying to tensor:
Cannot reproduce for no_charset_header_on_body_structure.txt. It properly picks default_charset of EUC-KR when opening mail even without a patch.
What is your default_charset?
I attached my main.inc.php.
Maybe, is it difference with PHP build options between your server and my server?
comment:23 follow-ups: ↓ 24 ↓ 25 Changed 5 years ago by tensor
Running Debian/lenny, php 5.2.6, latest Courier.
comment:24 in reply to: ↑ 23 Changed 5 years ago by joungkyun
Replying to tensor:
Running Debian/lenny, php 5.2.6, latest Courier.
my php build option is follow
./configure --prefix=/usr --sysconfdir=/etc/php.d --with-config-file-path=/etc/php.d --with-config-file-scan-dir=/etc/php.d/apache --disable-debug --disable-hash --disable-xmlreader --disable-xmlwriter --disable-json --with-exec-dir=/var/lib/php/bin --with-regex=php --with-mod_charset --with-zend-multibyte --with-zlib --with-zlib-dir=/usr --enable-sigchild --enable-safe-mode --enable-inline-optimization --enable-magic-quotes --enable-track-vars --enable-debugger --enable-sysvsem --enable-sysvshm --enable-sysvmsg --enable-libxml --enable-mbstring=all --enable-mbregex --enable-mbregex-backtrack --with-libmbfl --with-apxs=/usr/sbin/apxs --disable-cli --disable-cgi --with-gd=shared --enable-gd-native-ttf --with-jpeg-dir=/usr --with-png-dir=/usr --with-freetype-dir=/usr --with-sqlite=shared --with-sqlite-utf8 --enable-pdo=shared --with-pdo-sqlite=shared --with-iconv=shared --with-openssl=shared
PHP 5.2.6
Glibc 2.2.4
comment:25 in reply to: ↑ 23 ; follow-up: ↓ 26 Changed 5 years ago by joungkyun
Replying to tensor:
Running Debian/lenny, php 5.2.6, latest Courier.
If message charset is not exists, What does return charset of courier? Maybe I guess Courier returns non charset..
This case, Cyrus-Imap and Wu-imap return US-ASCII or X-UNKNOWN.
comment:26 in reply to: ↑ 25 Changed 5 years ago by joungkyun
Replying to joungkyun:
Replying to tensor:
Running Debian/lenny, php 5.2.6, latest Courier.
If message charset is not exists, What does return charset of courier? Maybe I guess Courier returns non charset..
This case, Cyrus-Imap and Wu-imap return US-ASCII or X-UNKNOWN.
Hmm, finaly, may I need to patch IMAP server for this problem is fixed?
comment:27 Changed 5 years ago by tensor
Courier returns NIL instead of "body parameter parenthesized list" when there are no charset defined in the headers.
I vote for the patch at RoundCube side, as there may be other IMAP servers with such problem.
I think it is safe to treat X-UNKNOWN as default_charset. All others should be at the discretion of RoundCube admin.
comment:28 Changed 5 years ago by alec
In my opinion, we should use rc_detect_encoding() for messages with NIL or X-UNKNOWN charset.
comment:29 follow-up: ↓ 33 Changed 5 years ago by alec
As I said, I have good results using rc_detect_encoding(), so test attached rcube_imap.patch, please.
comment:30 follow-up: ↓ 31 Changed 5 years ago by alec
There is one problem, in example message EUC-KR is detected as BIG5, so 'EUC-KR' must be added before 'BIG5' in rc_detect_encoding's charsets array. It would be nice to improve detection implementing mozilla's charset detector http://www.mozilla.org/projects/intl/chardet.html
comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 5 years ago by joungkyun
Replying to alec:
There is one problem, in example message EUC-KR is detected as BIG5, so 'EUC-KR' must be added before 'BIG5' in rc_detect_encoding's charsets array. It would be nice to improve detection implementing mozilla's charset detector http://www.mozilla.org/projects/intl/chardet.html
Some case, EUC-KR is decteced as SJIS. Quality of mb_detected_encoding is not good.
See also, attached mb_detect_encoding.jpg
Changed 5 years ago by joungkyun
comment:32 in reply to: ↑ 31 Changed 5 years ago by joungkyun
Replying to joungkyun:
Replying to alec:
There is one problem, in example message EUC-KR is detected as BIG5, so 'EUC-KR' must be added before 'BIG5' in rc_detect_encoding's charsets array. It would be nice to improve detection implementing mozilla's charset detector http://www.mozilla.org/projects/intl/chardet.html
Some case, EUC-KR is decteced as SJIS. Quality of mb_detected_encoding is not good.
See also, attached mb_detect_encoding.jpg
In my opinion, first member of enc array variable set $failover.
$failover = ! $failover ? $GLOBALS['CONFIG']['default_charset'] : $failover;
$enc = array(
$failover, 'SJIS', 'BIG5', 'GB2312', 'UTF-8',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4',
'ISO-8859-5', 'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9',
'ISO-8859-10', 'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
'WINDOWS-1252', 'WINDOWS-1251', 'EUC-JP', 'EUC-TW', 'KOI8-R',
'ISO-2022-KR', 'ISO-2022-JP'
);
comment:33 in reply to: ↑ 29 Changed 5 years ago by tensor
comment:34 in reply to: ↑ 14 ; follow-up: ↓ 35 Changed 5 years ago by joungkyun
Replying to tensor:
Issue for rcube-svn1941-broken-list.patch confirmed.
How about this patch? I checked SVN revsion 2000, but don't apply this patch and, subject that has no charset is still broken after subject that has chatset.
comment:35 in reply to: ↑ 34 Changed 5 years ago by tensor
Replying to joungkyun:
Replying to tensor:
Issue for rcube-svn1941-broken-list.patch confirmed.
How about this patch? I checked SVN revsion 2000, but don't apply this patch and, subject that has no charset is still broken after subject that has chatset.
Yes, it is not applied. I have checked previously the patch rcube-svn1941-broken-list.patch and it appears to be good. Anyone with svn access, please test and commit.
Steps to reproduce:
- Create a new folder.
- Import the attached message with no charset into the created folder.
- Copy any message with charset set in headers to the created folder.
- Try to sort the messages by date both asc and desc.
Trying to detect the charset for headers may help, but it often fails (see above). Falling back to default is better almost in all cases.
Also see #1485451.
comment:36 Changed 4 years ago by alec
rcube-svn1941-broken-list.patch applied in [5faac054].
comment:37 Changed 4 years ago by alec
- Resolution set to worksforme
- Status changed from reopened to closed
This is too old and too long to understand. Could anyone test with current svn-trunk version and make a summary. Can we fix something except charset detection? Please, open new, small tickets.
comment:38 follow-up: ↓ 40 Changed 3 years ago by para
- Cc khrhee68@… added
- Priority changed from 8 to 1 - Highest
- Resolution worksforme deleted
- Status changed from closed to reopened
- Summary changed from Default charset is ignored to Sender and Receiver charset are ignored
I upgrade my roundcubemail today by svn trunk version 4.2 or later.
I can not find version information in changelog file except "release 0.4.2.
Before upgrading, I have used svn trunk version 4.0.
I found decoding problem of sender and reveiver and there were not actual name of them.
I am not expert in it. so I put my screen shot of it.
I hope you understand it easily.
Sorry, I do not know how to put image.
제목 [단 14일만!] 물가 비상! 16개월 무이자로 가볍게~ 현대/KB/롯데/외환카드 결제액 합계 30만원이상 16개월 무이자! 보내는 사람 =?euc-kr?B?v8G8xw==?= Add contact 받는 사람 =?euc-kr?B?wMyx4sf2?= Add contact 회신 주소 mailmaster@auction.co.kr Add contact 날짜 오늘 17:50
From: "=?euc-kr?B?v8G8xw==?=" <mailmaster@auction.co.kr>
To: "=?euc-kr?B?wMyx4sf2?=" <KHRHee@kornet.net>
Reply-to: <mailmaster@auction.co.kr>
Subject: =?euc-kr?B?W7TcIDE0wM+4uCFdILmwsKEguvG78yEgMTaws7/5ILmrwMzA2rfOILChurGw1H4gx/a06y9LQi+31LWlL7/cyK/Eq7XlILDhwaa+1yDH1bDoIDMwuLi/+MDMu/MgMTaws7/5ILmrwMzA2iE=?=
Date: Fri, 29 Oct 2010 17:50:05 +0900
X-WORKER_ID: <single.default_Worker_136>
X-MEMBER_ID: <TV9JRD1raHJoZWU2OA==>
X-Mailer: eMsSMTP Ver3.5( PLUTO-build 0401 )
MIME-Version: 1.0
Content-Type: text/html;
charset="euc-kr"
Content-Transfer-Encoding: 8bit
comment:39 Changed 3 years ago by alec
- Resolution set to fixed
- Status changed from reopened to closed
@para: your issue fixed in [7bdd3e22].
comment:40 in reply to: ↑ 38 Changed 3 years ago by para
- Cc khrhee68@… removed
- Component changed from IMAP connection to Other
- Priority changed from 1 - Highest to 10 - Lowest
- Severity changed from normal to minor
- Type changed from Bugs to Tasks
I updated just now and found it is fixed.
Thank you very much for your quick response.

fixed case of ignored default_charset