Ticket #1485267 (closed Bugs: fixed)

Opened 4 months ago

Last modified 2 months ago

attaching files with russian filenames strips leading Cyrillic characters from the file

Reported by: tensor1982 Owned by:
Priority: 5 Milestone: 0.2-stable
Component: MIME parsing Version: svn-trunk
Severity: normal Keywords:
Cc:

Description

How to reproduce: 1. Create the file with cyrillic characters int the name, such as "русский.txt". 2. Start writing email. 3. Attach the file and send the letter.

What to observe: 1. In the Sent folder, in the letter the attached file name will be stripped from the leading cyrillic characters. 2. In the reciepent's mailbox the situation is the same.

Attachments

uploaded.PNG (2.1 kB) - added by tensor1982 4 months ago.
when uploaded into composed message
sent.png (1.3 kB) - added by tensor1982 4 months ago.
when viewed in a Sent folder
utf-8_attachments.patch (1.1 kB) - added by tensor1982 3 months ago.
fixes basename quirks, line endings are windows
test.php (1.0 kB) - added by tensor1982 3 months ago.
basename() exaples with different locales

Change History

  Changed 4 months ago by tensor1982

I have skimmed through the code and could not find the place where the transformation takes place.

Any hints?

follow-up: ↓ 5   Changed 4 months ago by alec

  • summary changed from attaching files with russian filenames strips leading Cyrillic caracters from the file to attaching files with russian filenames strips leading Cyrillic characters from the file

hint: works for me

  Changed 4 months ago by tensor1982

This happens both when using IE7 and Opera 9.51

See the attached screenshots.

Changed 4 months ago by tensor1982

when uploaded into composed message

Changed 4 months ago by tensor1982

when viewed in a Sent folder

  Changed 4 months ago by tensor1982

Environment:

  • Debian (testing)
  • RC from svn

in reply to: ↑ 2   Changed 4 months ago by tensor1982

Replying to alec:

hint: works for me

Please tell me what distro you are testing on.

This would help me to find the bug by comparing the differences. This issue was present since 0.1-rc (the first version I started working with).

  Changed 4 months ago by alec

Current gentoo with apache-2.2.9, php-5.2.6, postfix-2.4.6, client FF-3.0.1/Kubuntu-8.04. Maybe problem is Windows-only, will check at home.

  Changed 4 months ago by tensor1982

Tested with Firefox 3.0.1 on Window XP SP3. The problem is also present.

  Changed 4 months ago by tensor1982

Tested with Firefox 2.0.0.16 running on current Fedora 8 against the same server. The problem is present.

  Changed 4 months ago by tensor1982

Sorry, but gentoo is no-go for me due to installation complexity.

I you have other test platforms with precompiled packages which do not show the bug, please tell me and I will try to hunt it down.

In Debian (lenny/testing), I have these package versions:

  • apache2 2.2.9-6
  • libapache2-mod-php5 5.2.6-2+b1
  • php5 5.2.6-2
  • php-mail-mime 1.5.2-0.1
  • php-mail-mimedecode 1.5.0-3
  • postfix 2.5.2-2
  • courier-imap 4.4.0-1
  • fileinfo 1.0.4 (installed from pear)

  Changed 3 months ago by tensor1982

Any chances that the bug will be fixed before 0.2 is out?

Are there any predictions about a 0.2 stable release?

  Changed 3 months ago by alec

So, I've made some investigation and it looks like the problem is with the filename length not just used characters and may be related to IMAP server. My dovecot-1.0 is not RFC2231.3 compatible. Because of that roundcube cannot display long encoded filenames properly, but Thunderbird do this better (probably is not using BODYSTRUCTURE reply but parse message body). So, to be sure check message body/headers and server reply for BODYSTRUCTURE request.

  Changed 3 months ago by tensor1982

You may have found a similat bug, ocurring somewhere else.

From my observations the problem is that when messages are sent through MTA, they are already in a bad form.

Real world scenario: I send "русский.xls" through RC, and it gets truncated to ".xls" at the receving end.

The sending from RC occurs through postfix/SMTP. postfix then relays the message to the appropriate mail server running another postfix which drops the email into a Maildir/ style mailbox. Recipient is runnig a stand alone email program which gets the messages from the server via POP3 (courier).

In the message source we get:

--=_c6090698922dc1fa8902d39d1af41e4d
Content-Transfer-Encoding: base64
Content-Type: application/vnd.ms-excel;
 name*="UTF-8''.xls"; 
Content-Disposition: attachment;
 filename*="UTF-8''.xls"; 

0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAUwAAAAAAAAAA
EAAAVQAAAAEAAAD+////AAAAAFQAAAD/////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
...

It looks that the file names are mangled in the process of generation of raw message. MIME generating class may be the cause. If I browse the "Sent" folder through a stand alone IMAP client (Thunderbird), the attachment file name is shown as ".xls" and the message source is exactly as above.

If I send the message with the same "русский.xls" file through Thunderbird, it is properly delivered into mailbox which is used through RC. And RC shows the message just fine.

  Changed 3 months ago by tensor1982

Now I am investigating the MIME generating class...

  Changed 3 months ago by harmaty

This is caused by the known problem with php function basename, which doesnot work correctly with utf-8 strings See comments: http://ru2.php.net/basename

So you need to replace in program/lib/Mail/mime.php string '$filename = basename($filename);' with '$filename = ltrim(basename(' '.$filename));'

  Changed 3 months ago by alec

  • component changed from Client Scripts to PHP backend

it was fixed in mime.php from roundcube svn-trunk, please, be sure that you're using that file, and get as to know if that fixes your problem.

  Changed 3 months ago by harmaty

No, it doesn't fix. May be you mean: $filename = substr(basename('s_'.$filename), 2);

  Changed 3 months ago by harmaty

Also noticed that attachment filename with cyrillic characters now is not readable in Gmail, it looks like that: UTF-8%D1%8F%D0%BB%D0%B4%D0%B0.%D1%84%D1%8B%D0%B2%D0%B0

In RoundCube and Thunderbird clients there is no such problem.

In older version, content generated by mime.php before encoded attachment was:

Content-Transfer-Encoding: base64
Content-Type: application/x-bzip2; name="=?UTF-8?Q?=D1=8F=D0=BB=D0=B4=D0=B0.=D1=84=D1=8B=D0=B2=D0=B0?="; charset="UTF-8"
Content-Disposition: attachment; filename="=?UTF-8?Q?=D1=8F=D0=BB=D0=B4=D0=B0.=D1=84=D1=8B=D0=B2=D0=B0?="

in new version it is:

Content-Transfer-Encoding: base64
Content-Type: application/x-bzip2;
 name*="UTF-8''%D1%8F%D0%BB%D0%B4%D0%B0.%D1%84%D1%8B%D0%B2%D0%B0"; 
Content-Disposition: attachment;
 filename*="UTF-8''%D1%8F%D0%BB%D0%B4%D0%B0.%D1%84%D1%8B%D0%B2%D0%B0";

  Changed 3 months ago by tensor1982

Upgraded today to r1765, the problem is still here...

Is it ok to assume that RC uses mime.php from program/lib and not from the system wide directory?

  Changed 3 months ago by tensor1982

Ok, fixed it myself.

The better approach is to use UTF-8 locale, see the patch against r1765.

Changed 3 months ago by tensor1982

fixes basename quirks, line endings are windows

  Changed 3 months ago by tensor1982

Also see test.php to get a clue how basename behaves.

Changed 3 months ago by tensor1982

basename() exaples with different locales

  Changed 3 months ago by thomasb

  • severity changed from major to normal

We already have utf-8 locale set in rcmail.php setlocale(LC_ALL, $_SESSION['language'] . '.utf8'); but maybe it is mistyped... Also the substr(basename()) hack is applied in mime.php

  Changed 3 months ago by harmaty

Applied hack doesn't work - see above

  Changed 3 months ago by tensor

Mistyping the locale definitely does not work, I bumped into this issue while debuggin. The patch defintely solved the issue for me.

In theory, $_SESSIONlanguage?.'UTF-8' may not exist on the system, but I am just speculating. From my tests, basename() is only affected by LC_CTYPE logic, setting LC_ALL to desired locale is not neccesary for basename().

The the correct way to apply a basename() hack is in the patch.

substr('s_'.$something, 2) == $something

  Changed 2 months ago by tensor

My patch breaks printing in Opera because call_init() is stipped of and init() is called directly. window.print() does not work in opera until .onload is fired.

The proposed change is to signal to app object that two phases have been done:

  1. DOM is available
  2. Images have loaded (onload fired).

  Changed 2 months ago by tensor

The previous comment is for #1485304

  Changed 2 months ago by alec

Please, check following patch

--- mime.old	2008-09-21 10:40:36.000000000 +0200
+++ mime.php	2008-09-23 09:22:42.221738855 +0200
@@ -350,7 +350,7 @@
             $err = PEAR::raiseError($msg);
             return $err;
         }
-        $filename = substr('s_'.basename($filename), 2);
+        $filename = $this->_basename($filename);
         if (PEAR::isError($filedata)) {
             return $filedata;
         }
@@ -667,7 +667,7 @@
 
                 $this->_htmlbody = preg_replace($regex, $rep, $this->_htmlbody);
                 $this->_html_images[$key]['name'] = 
-                    substr(basename('s_'.$this->_html_images[$key]['name']), 2);
+                    $this->_basename($this->_html_images[$key]['name']);
             }
         }
 
@@ -1114,6 +1114,20 @@
         }
     }
 
-    
+    /**
+     * Get file's basename (locale independent) 
+     *
+     * @param string Filename
+     *
+     * @return string Basename
+     * @access private
+     */
+    function _basename($filename)
+    {
+	if (stristr(PHP_OS, 'win') || stristr(PHP_OS, 'netware'))
+	    return preg_replace('/^.*[\\\\\\/]/', '', $filename);
+	else
+	    return preg_replace('/^.*[\/]/', '', $filename);
+    }
 
 } // End of class

  Changed 2 months ago by Bibby

Hi, alec.

It works. Thanks very much.

Will it be merged into svn repository?

  Changed 2 months ago by alec

  • status changed from new to closed
  • resolution set to fixed
  • component changed from PHP backend to MIME parsing

Fixed in r1893. Patch sent to PEAR's bugtracker.

  Changed 2 months ago by tensor

  • status changed from closed to reopened
  • resolution deleted

The svn patch works as intended. However, the root cause was that I did not have ru_RU.utf8 locale generated (installed) on the server, so setlocale() call did not actualy change locale and it stayed as "C". The following patch can be used as workaround for other locale related issues.

=== program/include/rcmail.php
==================================================================
--- program/include/rcmail.php  (revision 1925)
+++ program/include/rcmail.php  (local)
@@ -170,7 +170,7 @@
     $_SESSION['language'] = $this->user->language = $this->language_prop($this->config->get('language', $_SESSION['language']));
 
     // set localization
-    setlocale(LC_ALL, $_SESSION['language'] . '.utf8');
+    setlocale(LC_ALL, $_SESSION['language'] . '.utf8', 'en_US.utf8');
   }

Another enhancement to the patch would be to fallback first to the language setting in main.inc.php and then to en_US, but config class does not expose the direct setting in main.inc.php, because the user preference overrides that.

  Changed 2 months ago by alec

  • status changed from reopened to closed
  • resolution set to fixed

Applied in r1895. We should use locale independent solutions where it's possible.

Note: See TracTickets for help on using tickets.