Ticket #1485311 (new Bugs)

Opened 4 months ago

Last modified 4 months ago

Common office formats are misdetected by mime detection code

Reported by: arekm Owned by:
Priority: 5 Milestone: later
Component: MIME parsing Version: 0.2-alpha
Severity: major Keywords:
Cc:

Description

Continuation of #1485152 (no permission to reopen original bug).

finfo doesn't solve this problem. finfo uses libmagic library which itself isn't able to detect many types of M$ attachments properly.

Example with example files found on internet:

$ rpm -q libmagic
libmagic-4.25-1.i686
$ file -i *
20080529_zalacznik_nr_3.xls:           application/octet-stream
Alice's Adventures in Wonderland.doc:  application/msword
Alice's Adventures in Wonderland.docx: application/zip
Office Open XML sample.doc:            application/msword
Office Open XML sample.docx:           application/zip
StockChart.ods:                        application/octet-stream
p35-47.ppt:                            application/octet-stream

xls is misdetected as octet-stream, docx are misdetected as zip files (well, these are zip files in reality but have own mime type for that), powerpoint file misdetected, openoffice calc ods file misdetected.

I think that roundcube should do filename based type detection for these commonly misdetected formats. Unfortunately M$ didn't think about making explict magic in these file formats to make detection easy. The filename (extension) based detection is the only way.

Common mime types: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/conf/mime.types?view=markup , OpenXML types: http://blogs.msdn.com/dmahugh/archive/2006/08/08/692600.aspx

Ideas, comments?

This bug is quite important to fix since office formats are very comonly seen as email attachments. Emails sent from roundcube to outside world, with wrong mime types cause other email clients/webmails to misbehave.

Attachments

mime-detect.patch (3.9 kB) - added by arekm 4 months ago.
proposed solution for comonly misdetected formats

Change History

Changed 4 months ago by arekm

proposed solution for comonly misdetected formats

Changed 4 months ago by tensor1982

An empty .rar file is often misdetected as text/plain, too. Consider including it into the list.

Note: See TracTickets for help on using tickets.