This blog entry was originally written in April 2007.
As almost anyone will know, the de facto "standard" document format for the last decade or so has been the MS Word Document, or .doc file. Every week I receive dozens of .doc and .xls files per e-mail. And every now and then I try to convince people to store their documents in a different file format, usually without much success. So what's wrong with using .doc files anyway?
People who are into Open Source usually come up with highly complex and technical arguments against the use of MS Word documents, such as "Microsoft is Evil" or "MS Office sucks, open source rules!". But the reality is of course that the majority of computer users run Microsoft Windows, use Microsoft Office, and are not about to install OpenOffice.org or switch to Linux in the near future.
Yet, the way I see it, there are two convincing arguments not to use .doc files (and other proprietary formats, but I'll get to that later) to store and exchange electronic documents.
First of all, there's the issue of future readability of documents. This argument may sound a bit far-fetched for most users, but given the speed at which computer hardware and software evolve it may become a very real problem in the not-so-distant future. As an example, 15 years ago the more or less "standard" word processors on the computer platform I used (the Atari ST) were 1st Word Plus and Signum. Within a decade, these programs had become so obscure that they now only have Wikipedia entries in German. Yet, I still have hundreds of floppies lying around with 1st Word Plus and Signum documents on them. If it weren't for the equally obscure conversion tool wpls2rtf, the 1st Word Plus documents might have been lost forever. And as for the Signum documents, if I want to open them I will need to either power up my ancient Atari, or try to get Signum running on an Atari emulator such as Hatari or Aranym. Not very practical, you will agree.
Some might say that, because MS Word documents are so common, there will always be software around to read or convert them. This is probably true, thanks in part to the Open Source community, who are responsible for reverse-engineering the proprietary .doc format so that in can be read by other software.
The current status of MS Word documents as a de facto standard is somewhat comparable to status of WordPerfect documents from roughly 1982 to 1995. Everybody used them back then, yet the majority of computer users probably wouldn't have a clue what to when they encounter a .wp or .wpd document nowadays. Sure, the document can easily be converted if you happen to own WordPerfect, if you have installed the WordPerfect import filter for MS Word or by using wp2rtf or an application based on libwpd, such as OpenOffice.org, AbiWord, Kwriter or wpd2sxw. But that still leaves the files effectively inaccessible for the majority of computer users, who simply lack the knowledge on how to deal with these former "standard" documents.
The second reason to avoid using .doc files is the fact that it is a proprietary format, created for reading and writing by a single application, controlled by a single company. Apart from any moral objections one might have against this situation, there are important practical implications as well. Despite the fact that several applications can read and write MS Word documents, the only application that is fully "compliant" with the .doc "standard" is Microsoft Word itself. Partly because of this, the majority of computer users is afraid to use possibly "non-compliant" software (i.e. anything other than MS Word) to open and create Word documents. Along with the differences in user-interface, the perceived risk that their documents may "look different" or may not be "compatible" with Word is mainly what keeps many people from using alternatives such as OpenOffice.org Writer and other packages. And of course this is not even a valid argument, because Word documents may look different on different computers anyway, and may not even be "compatible" between Word versions (see also my previous weblog entry), but that is not how most people perceive it.
The fact that MS Word is considered the standard, effectively means that you are forcing other people to use MS Word, every time you send a Word document. Most people do not consider this to be a problem, because they either have MS Office at work, have a paid-for (often OEM or student-license) copy at home that may (or may not) have come with their computer, or simply use an illegal copy, so "everyone has Word, right?". Apart from the people who bought a separate retail copy themselves, most don't realise that MS Office is a very expensive software package. At the time of writing, the cheapest retail version for Windows (MS Office Home & Student edition 2007) costs around 130 euro, while the retail version of MS Office Professional 2007 is a whopping 630 euro (making it about as expensive as a modern laptop). In other words, if you expect people to have MS Word, it means that they (or their employer, or commonly both) either need to spend hundreds of euros buying it, or have to use "pirated" versions of the software. Personally I find neither option acceptable.
If commercial organisations want to spend money on licensing fees, that's fine with me. But most public organisations also use MS Office, simply because it's "the standard". For a country like The Netherlands this means that billions of euros of public money are annually spent on Microsoft licensing fees. And for "developing" countries these licensing fees are yet another money flow to "Western" countries, when paid for. More commonly however, people simply use pirated copies of MS Office, ensuring that it remains "the standard" and that alternatives hardly get a chance.
So, if you shouldn't use .doc files for document storage and transfer, what are the alternatives? Currently, there are three alternatives, depending on your application.
Probably the most viable option in the long run is to use the Open Document Format (ODF), and truly standard (ISO 26300) and open file format to store documents. It is natively supported by most modern Office applications, including free packages such as OpenOffice.org, AbiWord and Koffice. As could be expected, MS Office does not support ODF at the moment, but luckily there is a freely available ODF plugin from Sun Microsystems for MS Office 2000 and up, which allows MS Office applications to read and write ODF files. This only works on Windows though.
If you're a Mac user, you may have noticed that MS Word for Mac is rather outdated and that it has compatibility problems with other versions of Word. I would therefore recommend Mac users to check out NeoOffice, a native Mac OS X version of OpenOffice.org, which seems to provide better compatibility with Microsoft Office than Microsoft Office for Mac does.
ODF is perfect for local storage of documents, but as long as MS Office remains the de facto standard, and doesn't integrate ODF support, you cannot rely on other people being able to open it. In other words, you could just as well send them a WordPerfect file. ;-) That leaves two alternatives for transferring documents to others. If the document does not have to be edited by others, it is best transferred as a PDF file. PDF (Portable Document Format) is another de facto electronic document format, designed by Adobe. Its main practical advantage over MS Office documents is that PDF documents have integrated fonts, and will look the same on any computer. And although it's not a truly "open" standard, it is very well documented by Adobe, there are freely available viewers (such as Adobe Reader and Foxit Reader), and many applications support writing it. Moreover, any document can be "printed" to a PDF file using the free and Open Source PDFCreator for Windows, or the PDF "printers" built into Mac OS X or Ubuntu Linux (as of 7.10 / "Gutsy Gibbon"). The main drawback of using PDF however is that documents that are in PDF format cannot easily be changed afterward, so be sure to keep a copy of the original document.
If you need to send someone a document that needs to be edited, the best option is to save it as an RTF (Rich Text Format) file. RTF is also a proprietary Microsoft document format, and has been defined on at least one occasion as "whatever Microsoft Word exports when it exports to RTF". But at least the RTF specification is freely available (albeit in MS Word format), it supports almost all MS Word features, and it can easily be read and written by other programs.
If someone sends me a .doc file for editing, I usually store it as RTF and send it back as an RTF or PDF file. Not that it helps much, but it's a start. ;-)
Finally, people interested in document standards might want to take a look at the NoOOXML campaign, intended to stop Microsoft's badly designed and semi-proprietary "Office Open XML" (OOXML) document format from becoming an ISO standard. With the Open Document Format (ODF, aka. ISO 26300) we finally have a good and open format to store documents. The world doesn't need yet another badly designed and overly complex Microsoft "standard" which is backwards- and bug-compatible with closed and patented formats. Especially when it's pushed through the standards committees by financial and political force. If you agree, you may want to sign the on-line petition at the NoOOXML site:
It would appear that Microsoft has announced native ODF support for Office 2007 SP2, which is to be released somewhere next year. This is certainly interesting news. Whatever reasons Microsoft might have for doing this, it would be nice if corporate MS Office users would at least be able to open and modify ODF documents without having to bug their IT departments to install non-Microsoft plugins. Let's wait and see...
Microsoft Office 2007 Service Pack 2 has finally arrived, bringing native support for ODF to MS Office roughly two years after version 1.1 of the ODF standard was released.
And not entirely unexpectedly, Microsoft managed to make its support for ODF spreadsheets (.ods) entirely incompatible with that of other applications. Typical...
So if you need proper ODF spreadsheet support in Microsoft Office, best install Sun's ODF plugin for MS Office 2000, 2003 and 2007, as at least that provides proper interoperability.
OK, I read XKCD... :-)