06 December 2008

Making LaTeX builds faster

I do most of my home computing on a first-generation Asus Eee -- with the seven-inch screen and all (though it's hooked up to a big monitor and ergonomic keyboard -- I'd go nuts otherwise!). The little Eee is terrifically portable (only 2.2 lbs, and fits easily with fully open screen on an airplane tray) but is a bit underpowered -- mine has an Intel Celeron processor overclocked to 900 MHz, and presumably the memory bandwidth is much less than that of my workstation at work. I notice especially how underpowered the Eee is when I typeset documents with LaTeX. Of course, making the most of underpowered hardware is a great nerd exercise ;-) so I'm posting some tips about how to do that here.

The first trick I tried was to precompile the document header. This saved a little bit of time but not much. Plus, the binary format of the precompiled preamble is not portable between different versions of the LaTeX toolchain, even when running under the same OS and brand of CPU. I left it in just to save 0.4 or so seconds per document pass.

Precompiling the preamble didn't save much time, so it seemed then that processing the (100-page) document was taking most of the time. This made me think about the TeX build process. Prof. Knuth considered a build an interactive process: TeX runs, gives some warnings about overly long lines (with locations so you can correct them later), and on an error stops with a prompt for interactive editing. This was probably because at the time Knuth developed TeX, building the document was a long computation. One would want to interact with it in midstream to correct errors, rather than breaking off a build, editing, and starting the build over again. This is partly why it's so much trouble to write a good Makefile for TeX documents -- Knuth set the tools to give you a lot of feedback by default, and you're supposed to read the feedback to tell whether you need to let a tool make another pass over the document. On a fast computer, all the informative messages and warnings produced by the TeX toolchain zip by too fast to see. On my Eee, however, they don't -- which made me wonder about the expense of terminal output itself. TeX has an option ("-interaction batchmode") to turn off all the informative output. Using this brought the time for a complete build (pdflatex once, bibtex once, then pdflatex twice) down from 30 seconds to 6 seconds!

Another optimization is taking advantage of the \include and \includeonly comamnds before the document is finished. The \include command generates a .aux file for each included file. If you add all the included files in the \includeonly list once and do a complete build, then you'll generate .aux files for all the included files. This gives LaTeX references and page locations for all the files. Then, you can remove files from the \includeonly list if you're not working on them, and only build the parts of the document in which you're interested. This technique also preserves page numbers, equation numbers, and other reference-related things. I was able to optimize a single pdflatex step down from 3.5 seconds to 2.5 seconds with this technique.

The \include command has a side effect: it adds a page break after the included file. (Presumably this lets LaTeX compute page numbers without needing to regenerate the .aux file.) For shorter documents, I prefer to use \input to add files, because it doesn't create page breaks. Since I'm writing a longer document with chapters and LaTeX inserts a page break at the end of each chapter anyway, I can use \include for "chapter files" and within each chapter file, use \input to include section files. This website recommends solving the page break problem by using \include when editing, and \input when publishing, but for me, using \include for chapters reflects how I work on the document anyway.

LaTeX is missing some features that could make a build go a lot faster. First, building a document doesn't work like building a C library: compilation of individual files isn't independent of the whole build. I have a long document and I've taken the care to break it up into modular files; this helps with revision control, but LaTeX can't exploit this very effectively (other than with the \includeonly trick). Second, LaTeX must pass several times over the same document in order to get references right; in Knuth's time, this probably was meant to save memory, but now it would be nice to use more memory in order to save some passes.