The top-level directory contains the front end and small things that do not (yet) merit a directory of their own. Things that have not yet moved out of this directory are polluting it too :-)
The inc subdirectory contains all the include files and is in imminent danger of losing some of them. In particular header files specific to indivual reader stages are likely to move into the reader subdirectory.
compat contians objects that implement things that word2x wants but are not present in various vendor`s C libraries. lib contains objects used by various bits of word2x, some of which are migrating are now deemed not general purpose enough to stay in the library and migrated to the source file that needs them.
The doc directory contains some developer orientated documentation. samples contains various word documents, emailed to me by various people having trouble converting them. devel_progs contains a few gooides you might find useful (in particular there is a hex dump program that you might prefer to od).
RCS directories contain RCS sources, some of which are none to flattering about the ability of yours truly :-)
The reader directory contains sources for reading a word file and turning in into a stream of tokens. reader.cc implements a few shared reader functions, argparse.cc is covered below, and tok_misc.cc contains the code to print tokens in a programmer friendly manner.
junk_filter.cc is class dervived from streambuf for stripping junik out of word documents and reducing them to text. collect.cc is a token source that combines entire paragraphs from the input stream, currently usuyally generated by junk_filter.cc.
The other reader stages process token streams dervived from some source dervived from tok_trans (any tok_src subclass is such a class, incidently).
eqnarr.cc and sections.cc are not implemented yet and currently replaced by a stub. I will fill them in eventually but hope someone does so before I get around to this stuff.
Everything with tunable values allows people to twiddle them as options using the modular option handling.
The front end of word2x is implemented in word2x.cc, driver.cc and mainopts.cc
word2x.cc is a front end of the program and gets thinner as more stuff is deemed worthy of a seperate source file. --digest is handled specially in this file. The global options are delegated to mainopts.cc
driver.cc constructs the driver using the specification given to --digest or "default". make_stages calkls make_stage with each comma seperated stage. If the stage is an alias the make_stage calls make_stages with the expansion of the alias. Currently cicular alias expansion is not detected.
mainopts.cc implements the global option using the fancy argument parsing features found in arguent.cc. The --digest option is a dummy to stop getopt_long comlaining about this option.
Currently there are two major pieces of infrastcture, modlar argument parsing and the modlar reader stages.
argument.cc implements the fancy argumnet parsing. The only
major code is the argparse class. The constructor collects all the
arguments together. parse_args calls getopt_long
and
forwards the arguments to the appropiate modules. Finally the
destructor frees all the allocated data, etc.
The reader stages are derived classes from tok_trans. Token
sources are derived from tok_src which is dervived from
tok_trans. Almost all everything is implemented as inline code in
inc/reader.h
.
reader/reader.cc implements a few shared reader functions. argparse.cc treats arguments passed to modules as lst of long format options, wiht the -- prefix removed, seprated by commas with arguments seperated by = signs. Modules are free to treat the argument as they see fit but currently all modules with tunable parameters feed it to argparse.