Note to spammers: permision is not given to spam any address on these pages. Anynone sending spam sent to these addresses will be reported to their ISP with a request that their account is terminated. A removal address will not prevent an abuse report. Bulk freindly ISPs are blocked without mercy.
word2x-0.005 is the latest version. If you are using a prior complete release (some version of word2x-0.001 to word2x-0.003), please get the latest version from the download page. If you where having problems with word2x-0.004 then the new version fixes various minor problems (with gcc-2.95, i686 systems and more). libpt is also distributed here.
Word2x is a GPLed program for converting word documents to text without any Microsoft software to help you, including non-microsoft operating systems (and therefore no OLE dll's, etc). There may be one object that is distributed only pre-compiled on my Linux system due to restrictions on the "real" information about word documents. This file will be excluded from the GPL. It will not be vital to convert documents but will offer better performance, for obvious reasons.
The currently supported output formats are plain text, LaTeX and HTML. The program converts word to a central format and output modules write the desired target format.
If you have a problem please consult the FAQ and known bug list before reporting a bug. Developers will find resources and a wish list of jobs they can do on the word2x development pages.
I hope to release a text stripper based on the latest code sometime moderately soon. The interface currently needs cleaning up in several ways. (Offers to do this accepted).
Mirroring this site is fine, however please tell me about it. I have made an effort to get search engines pointing people here. A yahoo link seems not to be appearing (maybe I got the category wrong or missed something).
These pages are designed to be informative and not to win any "cool site of the week" awards. Loads of Graphics, JAVA and Active X (which everyone should have disabled, lest it bills their credit card for $$$$) are not something this site wants.
Environments reported to work include
Environments reported not to work include
Most Unix-like platforms should work. If anyone managed to using a more "standard" PC C++ environment then please inform Duncan Simpson <firstname.lastname@example.org> and W.Hennings <W.Hennings@fz-juelich.de>. Most unicies should be fairly smooth due the GNU autconf configuration script. The script is probably too stressful for Microsoft's imitations of the Un*x tools. The cygwin32 versions of the tools do not have this problem (they are well tested GNU versions).
Please visit the download page.
word2x originally imagined it might get a few academic users. However I know it is used by many people, including people inside various major companies. word2x has apparently been breeding on various ftp sites worldwide... I imagine the user base is very significant (-: Always nice to know one's baby has succeeded beyond it's wildest dreams.
wv uses understanding of the latest binary file format to convert word 8 documents into HTML. It does not handle word 6 documents, which I know have very different binary information in some areas.
catdoc is a much simpler and smaller program for reading word documents. It might work better than word2x for some documents (a much more complex program, with a touch of the resource hog nature). If your browser objects you can download catdoc via this link.
Yes. The new version has lots of extra features planned but will take time. A lot of the internals have been upgraded and need debugging in addition to all the other new code.
At present the development code tree is mutating too fast to give a release date of a final version. Helpers can advance the release date by doing some of the coding for me.
Further information can be found on the word2x development pages.
Email it to email@example.com of course! Unified contexts diffs are the prefered format. However if it is a patch against the monolithic reader then please get a newer version first (even if this requires getting the development version). The reader is now completely and utterly different, although a suprising amount of the code survived the cleanup (most of it envolved beyond recognition).