Sorry, Professor, the dog ate my thesis: How to expect the unexpected when using LATEX

(Human Factors Research Group, University College Cork)

Living and Working with , London Mathematical Society, 20 October 2006

Abstract

The last thing you want when submitting a major thesis is for unexpected problems to crop up at the last minute. Even if they do, you need to have a plan for dealing with them which doesn't involve exporting the text into a wordprocessor to finish the job.

This paper describes some of the most commonly-underused techniques for providing the safety-net which can rescue success from the jaws of defeat. It is based on the author's experience of supporting LATEX in an academic environment as well as in commercial publishing.

1  A little bit of background

TEX—​the language—​was made available by a computer scientist and mathematician principally for other computer scientists—​and for those mathematicians willing to learn some computing using a dumb terminal. At the time (1978–1982) no other interface was immediately envisaged, either logical (syntax) or physical (editor). Knuth has on several occasions expressed moderate surprise (and pleasure) that TEX has been used so widely, especially by people well outside the original implicit user community (for example in the Humanities, in schools, and domestically).

The original TEX language is no longer required knowledge for any user writing a conventional document, and is now of interest only to the developer, the computer scientist, and the typographic programmer. Systems such as LATEX and ConTEXt provide a logic-based interface designed to relieve the normal user engaged on conventional document work from the need to know or understand anything of the internal workings of TEX. Even in mathematics, which is closest to the core of TEX's original designs, the need to know and understand the syntax and operation of the language as applied to mathematics no longer implies a need to understand everything else about TEX.

In the physical interface—​editors—​progress has been less successful. Textures (Apple Mac), Scientific Word (Microsoft Windows), and LYX (all platforms) have amply demonstrated that a synchronous typographic interface to a LATEX document is possible; and systems like the TEX daemon [] and preview-latex [] provide an alternative near-synchronous display capability. However, none of these systems address anything remotely like the interface required by the new (non-programmer) user, where all trace of the markup and processes is invisible, and the interaction design meets the expectations of the user. It is perhaps instructive to note that in the parallel field of XML editors, despite the extensive investment capital available, a similar position still obtains [].

For the postgraduate student using LATEX to prepare a thesis, there is thus the advantage of not needing an in-depth knowledge of plain TEX, and the disadvantage that a usable markup-free synchronous typographical editor does not exist. For the mathematical thesis author, whether in mathematics per se or in another field making use of mathematics, the suitability or otherwise of graphical interfaces to formulae and expressions outside the TEX field has become a matter of choice. Until recently, mathematicians absorbed TEX syntax from an early age, but the prevailing hegemony of the wordprocessor now poses a significant threat to TEX's survival, despite the obvious disadvantages of the wordprocessor paradigm.

Apart from the interface, however, there are numerous other hidden traps awaiting the thesis author using LATEX, and the remainder of this paper deals with some of the most common in this author's experience.

2  Trips and traps

This collection is based on the author's experiences of supporting TEX and LATEX among postgraduate authors and academic staff (faculty). It is wholly unscientific, but it is factual, not anecdotal. In recommending LATEX to this community the author has found that conscientious users can save about two months formatting time off the average 3–4 year PhD writing cycle, in addition to having a more reliable, better-looking, and more long-lasting opus.

2.1  The FAQ

Find the TEX FAQ and bookmark it. This is your first port of call for questions about ‘‘How do I…’’ and it enables you to solve most of the common queries that users come up with.

Only look for additional sources of information once you're certain your question isn't answered here. The Internet is full of friendly and helpful people, but they are volunteers, and it's courteous not to waste their time by asking questions that are answered elsewhere.

2.2  CTAN

The Comprehensive TEX Archive Network is a collection of servers around the world hosting all the free software relating to TEX systems. It's rooted at the TEX Users Group (TUG), DANTE (the German TEX Users Group), and UKTUG (the UK TEX Users Group).

From here you can download everything from complete systems (ISO CD and DVD images for installation) right down to the smallest component of an individual style file (package) or font.

Packages are the plug-in components of LATEX. There are hundreds of them, providing refinements, formatting solutions, and fonts. CTAN has an index of them you can search.

Most institutional thesis layouts come with a specific set of packages designed to provide the features the institution mandates, so you should stick to these unless you have permission to diverge.

For other purposes you should learn how to download and install packages unless your TEX installation provides automated download-and-install-on-demand (MikTEX, for example).

2.3  Usenet News

If you're not already connected, go and learn how to read Usenet News. You'll need a newsreader (a Usenet News reader, not a blog/RSS newsreader)—​see the Newsreaders site for lists—​but at a pinch you can use a good mailer with built-in newsreading capability like Thunderbird, or a web interface like Google Groups, or even Microsoft Outlook.

The group for discussions of TEX is , but you don't have to plough through all 100+ daily posts: just get the hang of how to ask questions and comport yourself—​see one of the many guides to ‘Netiquette’ such as Griffith's [].

2.4  ‘‘My equations won't…’’

…line up, number themselves, use the right font, fit the width, etc. The finer points of mathematical typesetting are addressed in numerous documents, both online and in print.

If you are using maths heavily, you need to have read at least some of them: there is a short list in [].

2.5  Get equipped!

Despite my rant about editors, there are plenty of usable interfaces around. I'm not going to mention any of them by name, except that AUCTEX appears to be the most favoured by the professional mathematician.

Don't even think of using a non-TEX editor like Notepad unless you are fond of self-inflicted pain.

The other tools you will almost certainly need are:

A bibliographic manager

See the discussion on in .

A knowledge of LATEX is required
A graphics editor

If you're including photographs or other continuous-tone artwork, use GIMP, Adobe PhotoShop or PaintShop Pro, or similar for any preparatory work, and save them as JPG, PNG, or PDF files. You must use pdflatex to process documents using these formats: standard LATEX with dvips only works with EPS files, which are unreasonably huge when used with bitmap formats like these.

If you're including diagrams of any kind, make sure they are done as vector drawings (PDF or EPS), not bitmaps. EPS only works with standard LATEX using dvips; PDF only works with pdflatex.

This is a gross simplification, but that's what I'm trying to do: simplify.

A spellchecker and/or thesaurus

Ispell and Aspell work well, but with limitations, although it doesn't take too long to train them to your specialist vocabulary. Beware the generic dictionary-based products which may be too general.

Thesauruses are all commercial products, and I'm not aware of any which work specifically with TEX editors.

Another useful tool (Emacs only at the moment) is the Remembrance Agent, which indexes all your text files (LATEX, HTML, email, etc) and provides a context-sensitive window showing documents related to the topic around the cursor position (‘point’). This enables you to check and avoid (or deliberately re-use) phrases used elsewhere.

Learning to spell and punctuate is good, too.

2.6  Bibliography

If you're not already using , start now. There is no excuse at all for manually formatting every entry and trying to maintain them in that format.

If you can't stomach the thought of typing the syntax, use a graphical interface like tkbibtex or JabRef. The latter is preferred, as it will accept a large number of other bibliographic reference formats such as RIS and SilverPlatter, which are in common use on web reference sites and in libraries.

Find out what format your references must be in for submission (if this is not already mandated by an institutional thesis document class) and make sure you have the relevant packages installed.

In the extreme case of your professor requiring an obtuse or degenerate reference format (which is not unknown), ask the very helpful people on comp.text.tex.

2.7  Keep a copy online

As an additional form of backup, copy your thesis-in-development to your web site. If necessary, don't link it, just put it there, if you have concerns about people browsing a partial document.

When you're away at a conference or you meet someone useful or interested, you can then show them or just give them the URI.

Better still, create a page about your research where you can publicise what you're doing—​and in the process, effectively stake your claim to be the researcher in this field. Some universities require all postgraduates to maintain a web site about their research. Others doubtless run a mile.

2.8  Learn to write simple macros

Having said in that you don't need to know any TEX source code, I'm now going to break the rule.

If your thesis contains very frequently-repeated objects, it will be much easier and more maintainable in the long-term to have a macro which can act as both label and formatter, and be re-used ad infinitum.

amo (1)
Person Singular Plural
1 amo I love amamus we love
2 amas you love amatis ye love
3 amat s/he loves amant they love
\newenvironment{verbal}[1]{% \begin{center}\sffamily\noindent\textit{#1}\par\medskip \begin{tabular}{c>{\bfseries}l>{\itshape}l>{\bfseries}l>{\itshape}l} }{\end{tabular}\end{center}} \newcommand{\dbl}[1]{\multicolumn2l{#1}} ... \begin{verbal}{amo (1)} Person&\dbl{Singular}&\dbl{Plural}\\[6pt] 1&amo&I love&amamus&we love\\ 2&amas&you love&amatis&ye love\\ 3&amat&s/he loves&amant&they love\\ \end{verbal}

2.9  The best form of attack is defence

Your defence. As of submission, you are the domain expert. No-one else has researched your particular aspect of the topic in as much depth (or you wouldn't have been allowed to select that topic), so it's your responsibility to put it forward as effectively as possible.

Don't let other things get in the way: as with any publication, the objective is communication, not raising barriers. Your viva is your chance to convince a sceptical extern of your worth, not a chance to show off your flowers of eloquence or your skill with fonts.

2.9.1  Formatting style

Stick to the plain and simple. This ought to be mandated by your institution's thesis standard, if there is one. A thesis is not the place for quirks and cunning stunts of typography—​keep that for when you turn it into a book.

Pick a common and readable typeface and stick to it (another reason for avoiding wordprocessors, which have a nasty habit of changing font for no good reason), and don't mix typefaces.

Adhere to the conventions of your discipline, and be consistent (the mantra of every publisher since Gutenberg, and one of the principal advantages of LATEX).

2.9.2  Structural style

Most readers can hold three levels of sectional structure in their head while reading. Below that, they lose track of where they are, especially if you have subsubsubsubsubsections more than two pages long, so that there is no visible indication of what part of the tree you're in. Keep away from the very finely-nested structures unless absolutely unavoidable.

Avoid nested lists as well: they are part of the structure. Unless mandated otherwise, use a simple decimal notation for sections, and arabic or alphabetic labelling for lists.

2.9.3  Writing style

Eschew obfuscation: while you are the domain expert for the duration, write for the intelligent, informed, and expert reader from a closely-related domain. You're among equals, so don't explain the obvious, but cite anything that would otherwise look like a leap of faith—​extraordinary claims require extraordinary proof

Short sentences are good, but don't be afraid to use a long one where there is a complex point to be made. Make sure you write complete sentences, too, without missing important parts of speech.

There is some debate about the merits of writing in the first or the third person (‘‘I designed an experiment…’’ as opposed to ‘‘An experiment was designed…’’ or ‘‘We designed an experiment…’’). Formal usage dictates the third person, but the first person is becoming more widespread.

2.10  Backup

This shouldn't need mentioning, and it is arguably out of scope anyway, as it applies to all work on a computer, whether LATEX or not. If you don't have a recent backup of your entire thesis—​document, diagrams, photographs, research data, background information, supporting documents, etc—​go and make one immediately.

Any competently-designed editor will automatically make a backup file each time you start modifying a document, by default, silently, and without question. This is part of taking a pragmatic approach to computing. I would submit that if your present editor does not do this (or some equivalent) then you are using the wrong editor, and you should replace it as soon as possible.

In theory the same should apply to editing other types of file, such as images, but the large size of images compared with the size of a text document may preclude this on some systems

Having made sure your system is creating editorial backup copies, you then need to ensure that you make a full duplicate of all your files onto some other medium. Tape is still the reliable standby of all computer centres, but it is expensive for a user's workstation. Unless your institution creates backups for you over the network automatically (rare), you should write your files to CD, DVD, or memory device regularly and frequently, and keep the copies at home, out of your office, lab, or locker.

You may be doing this anyway, if you bring your thesis with you between home, campus, and anywhere else you work. Just make sure that you label the medium with the correct date.

A final word of caution about USB memory devices: these are not disks, and cannot be re-written ad infinitum as disks can. Depending on the make, model, and capacity, there is a large but finite limit to the number of rewrites they will tolerate, after which they are read-only devices. Under normal use (copying a few files) this limit will scarcely be reached within the physical lifetime of the device, but when regularly copying very large volumes or very high numbers of files, the limit may be reached within 2–3 years.

3  Expectans expectavi

Writing a thesis takes a long time. Both the research and the writing require infinite patience as well as a capacity for handling the unexpected. If the hints in the foregoing section are to be of use, they need to be accompanied by a core set of computing skills which are not nowadays taught to undergraduates.

The European Computer Driving Licence (ECDL) was supposed to provide a minimal set of computing skills, but for practical reasons it is largely restricted to office procedure and software.

In fact most of the additional skills are mentioned, but rarely taught with any enthusiasm.

The specific skills (with their ECDL paragraph number) which will serve you well are:

This may seem like a trivial list, but the inability to perform simple housekeeping and installation is the cause of more helpdesk calls than any other single category of request.

Armed with knowledge, you should be prepared to handle most of the twists of fate which come your way. The rest is in your capable hands.

References

  1. [1986] ISO JTC 1/SC 34; ISO Standards: Information processing—​Text and office systems—​Standard Generalized Markup Language (SGML). International Organization for Standardization, Geneva, 1986
  2. [1994] Burnard, Louis and Sperberg-McQueen, CM (Eds): Guidelines for the Encoding and Interchange of Machine-Readable Texts. Text Encoding Initiative, Oxford and Chicago, 1994. Link to online resource
  3. [2001] Cowan, John and Tobin, Richard (Eds): The XML Information Set. W3C, 2001, Cambridge, MA. Link to online resource
  4. [2003] Semantic Research, Inc: Semantic Reader. Semantic Research, Inc., San Diego, CA, 2003. Link to online resource
  5. [Barry2000] Barry, Sarah; Thyssen, Anthony and Taylor, Kim : Electronic Mail/News Etiquette. Griffith University, 2000, Brisbane, Australia. Link to online resource
  6. [Biggs1997] Biggs, Michael and Huitfeldt, Claus: ‘Philosophy and Electronic Publishing: Theory and Metatheory in the Development of Text Encoding’. In ‘The Monist’, 80:3, Buffalo, NY, 1997. Link to online resource
  7. [Bray2000] Bray, Tim; Paoli, Jean; Sperberg-McQueen, Michael and Maler, Eve: Extensible Markup Language Version 1.0. W3C, Cambridge, MA, 2000, 2nd Ed.
  8. [Brugger] Brugger, Rolf; Zramdini, Abdelwahab and Ingold, Rolf: ‘Modeling Documents for Structure Recognition Using Generalized N-Grams’. In 4th International Conference on Document Analysis and Recognition (ICDAR'97), 1997, Ulm, Germany, pp.56–60. IEEE Computer Society, 0-8186-7898-4. Link to online resource
  9. [Calderwood1996] Calderwood, David: ‘An Internet Exercise in Conveyancing Practice’. In ‘Web Journal of Current Legal Issues’, 5, 1996. Link to online resource
  10. [Campbell] Campbell, Eoin: ‘Word and YAWC: A Poor Mans' XML Publishing Environment’. In XML Europe, 2002, Barcelona. Graphic Communications Association, Daingerfield, VA. Link to online resource
  11. [Close2003] Close, Tyler: Waterken™ Doc: Document Model Specification. Waterken Inc., 2003, The Valley, Anguilla. Link to online resource
  12. [Coombs1987] Coombs, JS; Renear, Allen and DeRose, SJ: ‘Markup Systems and the Future of Scholarly Text Processing’. In ‘Communications of the ACM’. ACM Press, 30, New York, NY, pp.933–947, 1987. Link to online resource
  13. [Cournane1997] Cournane, Mavis: The application of SGML/TEI to the processing of complex multi-lingual text. University College Cork, Cork, Ireland, 1997. Link to online resource
  14. [DeRose1990] DeRose, SJ; Durand, David; Mylonas, Elli and Renear, Allen: ‘What is text, really?’. In ‘Journal of Computing in Higher Education’, 1:2, Amherst, MA, pp.3-26, 1990, 1042-1726
  15. [Fine2001] Fine, Jonathan: ‘Instant Preview and the TEX daemon’. In ‘TUGboat’. TEX Users Group, 22:4, Portland, OR, pp.292-298, 2001
  16. [Flynn] Flynn, Peter: ‘W[h]ither the Web? The extension or replacement of HTML’. In SGML/XML'99, 1999. Graphic Communications Association, Daingerfield, VA
  17. [Flynn] ‘If XML is so easy, how come it’s so hard? The usability of editing software for structured documents’Flynn, Peter: . In Extreme Markup Conference, Montréal, QC, Late-breaking: 8 August 2006. Link to online resource
  18. [Flynn1999] Flynn, Peter: ‘The vulcan package: A repair patch for LATEX’. In ‘TUGboat’, 20:3, 1999. TEX Users Group, Portland, , OR. Link to online resource
  19. [Flynn2002] Flynn, Peter: ‘Formatting Information: A beginner's introduction to typesetting with LATEX’. In ‘TUGboat’, 23:2, 2002. TEX Users Group, Portland, , OR. Link to online resource
  20. [Furuta1998] Furuta, Richard; Quint, Vincent and André, Jacques: ‘Interactively editing structured documents: Combining the advantages of structured documents and WYSIWYG editing’. In ‘Electronic Publishing—​Origination, Dissemination, and Design’, 1:1, pp.19–44, 1988
  21. [Goldfarb1990] Goldfarb, Charles: The SGML Handbook. OUP, Oxford, England, 0-19-853737-9, 1990
  22. [Kastrup2002] Kastrup, David: ‘Revisiting WYSIWYG Paradigms for Authoring LATEX’. In ‘TUGboat’. TEX Users Group, 23:1, Portland, OR, pp.57-64, 2002
  23. [Knuth1984] Knuth, Donald: The TEXbook. Addison-Wesley, Reading, MA, , 1984
  24. [Lamport1985] Lamport, Leslie: LATEX: A document preparation system. Addison-Wesley, Reading, MA, 0201529831, 1st Ed., 1985
  25. [Lamport1994] Lamport, Leslie: LATEX: A document preparation system. Addison-Wesley, Reading, MA, 0201529831, 2nd Ed., 1994
  26. [Lonnert1996] Lonnert, Set: Towards a new design of graphical interfaces. Set Lonnert Humanistics & Technology, Uppsala, Sweden, 1996. Link to online resource
  27. [Mamrak1988] Mamrak, SA; Barnes, J; Hong, H; Joseph, C; Kaelbling, M; Nicholas, C; O'Connell, C and Share, M: ‘Descriptive Markup: the best approach?’. In ‘Communications of the ACM’. Association for Computing Machinery, 31:7, New York, NY, pp.810-811, 1988. Link to online resource
  28. [Pillot1999] Pillot, Patrice and Obrecht, André: Instructions to Authors: European Workshop on Content-Based Multimedia Indexing. GT-10 Multimedia Indexing working group GDR-PRC ISIS of the CNRS, Toulouse, France, 1999. Link to online resource
  29. [Quin1996] Quin, Liam: ‘Suggestive Markup: Explicit Relationships in Descriptive and Prescriptive DTDs’. In SGML'96, 1996. Graphic Communications Association, Alexandria, VA, pp.405-418, 1996
  30. [Raman1994] Raman, TV: An audio system for technical readings. Cornell University, 1994. Link to online resource
  31. [Renear1996] Renear, Allen; Durand, David and Mylonas, Elli: ‘Refining our notion of what text really is: The problem of overlapping hierarchies’. In ‘Research in Humanities Computing’Ide, Nancy and Hockey, Susan (Eds): . Oxford University Press, Oxford, England, 1996
  32. [Renear2002] Renear, Allen; Dubin, David and Sperberg-McQueen, CM: ‘Towards a semantics for XML markup’. In ACM Symposium on Document Engineering, 2002. ACM Press, , New York, NY, pp.119-126, 2002, 1-58113-594-7. Link to online resource
  33. [Salminen] Salminen, Airi: ‘A relational model for unstructured documents’. In Yu, CT and Van Rijsbergen, CJ (Eds): 10th annual international ACM SIGIR conference on research and development in information retrieval , 1987, pp.196–207. ACM Special Interest Group on Information Retrieval, New York, NY, 0-89791-232-2
  34. [Saltzer1964] Saltzer, JH: TYPSET and RUNOFF, Memorandum editor and type-out commands. MIT, Cambridge, MA, Nov 1964. Link to online resource
  35. [Sosnoski2001] Sosnoski, Dennis M: A look at features and performance of XML document models in Java. IBM, 2001, Armonk, NY. Link to online resource
  36. [Tognazzini1996] TOG on InterfaceTognazzini, Bruce: . Addison-Wesley, Reading, MA, May 1996, 0-201-60842-1