Word binary format information

This page collects puvlically avialable information about word binary format. "The file format's Handbook" (G.Born) has information on Word for DOS and Write format. (Write format is a close relative of word format). Their are some major differences between word 6 and word 7+ format.

Word .doc files are OLE archives. A good source of information and PERL code for processing these archives are the LOALA pages. The development code includes much of a C++ implementation of an OLE library (Microsoft's library is useless in Un*x environment where it is not avialable). Some publically avialable DCE code is included, for the distributed features. C source for converting Unix dates to Windows 32 dates, as found in OLE archives, and vica-versa is included in the OLE library.

The main file (the WordDocument stream in the archives) is divided into 128-byte blocks that are entirely used for a single purpose (this is easy to discover and no suprise given write and word for DOS format). Excess space in blocks is apparently filled with 0.


If you have information not on this page that is publically avaialable email to Duncan for verification and inclusion here. I have the real information but it is subejct to a non-disclosure agreement. The format is baroque, and explains the size of the word binary.


Development top page
Duncan Simpson
Last modified: Wed Apr 29 15:41:32 BST 1998