If XML is so easy, how come it’s so hard?: The usability of editing software for structured documents

Peter Flynn (Human Factors Research Group, University College Cork)

Extreme Markup Conference, Montréal, QC, 2006-08-08

Abstract

This paper reports on recent research into the usability of editing software for structured documents. The overall objective is to identify why markup-based editing is regarded as ‘difficult’, and to investigate the potential for changes to the prevailing interface paradigms which might address the problem. The principal systems investigated were SGML-based (XML) and TEX-based (LATEX and ConTEXt).

The demand for markup-based editors was categorised and quantified by an analysis of posts to the Usenet newsgroups comp.text.xml and comp.text.tex and the XML-L mailing list. This highlighted some perceived deficiencies with existing systems as well as some underlying misapprehensions by potential users.

Baseline data on attitudes to XML editing software was gathered from a pilot group of expert users who were surveyed to identify their attitudes and expectations in respect of editing software, both for customer implementations and for their own work. This revealed further dissatisfaction with existing systems, especially for customer use (Sect2 2.1).

Twenty-five editing systems were examined to compare the facilities provided, and to derive a comparative measure of their usability, accessibility, and usefulness. This showed that there is no significant difference between editors in the facilities offered, only in the accessibility (placement of those facilities in the user interface).

Investigation of the requests and comments from new users identified some conflicting perceptions, specifically that a) XML is perceived as hard to use because the editors don't look or work like wordprocessors; b) some users are convinced that an interface to editing XML must exist that can be used without configuration, by novices with no training, to produce completely fault-free documents. Work is ongoing to identify the extent to which the [mis]perceptions may be resolvable through changes to the interface.

1  Background

Text markup systems from their earliest origins in the 1950s and 60s were largely based on inline markup using plaintext characters. RUNOFF is a canonical historical example [Saltzer1964]: the most common modern equivalent is LATEX [Knuth1984]. SGML [1986] and particularly XML [Bray2000] are more recent, and provide greater structural rigour and generic adaptability.

1.1  Markup and editing models

The features of plaintext inline markup systems are typically:

The SGML and XML systems in use today are based on similar principles, but with the following changes:

The inline plaintext model of markup has conditioned file formats for text systems as well as the products using them, from RUNOFF (1963) through SCRIPT (1967), PUB (1971), troff/nroff (1973), GML (1974), TEX (1978), LATEX (1984), and SGML (1986), to XML (1996). Typically, the more generic and programmable the markup becomes. the wider the choice of editors for a given format.

Similar models affected the design both of file formats and interfaces for WordPerfect (1980), Word (1983) and a large number of other wordprocessors such as PC-Write (1984) and XYWrite (1985) and their derivatives (eg Nota Bene). Originally, in almost all cases, markup was displayed on the screen along with the text, in the system (monospaced) typeface, but is now conventionally hidden to provide a ‘near-WYSIWYG’ view.

1.2  What You See Is What You Get

Before the widespread use of bitmapped graphics screens, bright and dim text variants were used to simulate bold and italics, and printers could overprint for bold and underline for italics. Graphical screens and multi-font printers replaced the older technology during the 1970s and 1980s, and colour became commonplace in the 1990s.

The two key early developments were the use of ‘real-time’ or ‘direct intervention’ editing (also called ‘visual’, or ‘synchronous’ editing), and the bitmapping of fonts to graphical displays. These were dependent on advances in technology. The third development — the hiding of markup — was a conceptual move, and facilitated the real-time typographic formatting which gives us the current WYSIWYG paradigm. The conventional definition of What You See Is What You Get (WYSIWYG) in fact combines all three developments to produce ‘‘a system in which content during editing appears very similar to the final product’’ (Wikipedia)

The application of synchronous editing to a typographically-formatted display came with Bravo (Xerox, 1974: the technique was nicknamed WYSIWYG later). In synchronous editing, the characters typed, erased, or replaced were visible during the operation, rather than after the event. Visual plaintext editors such as Emacs (Stallman, 1975) and vi (Joy, 1976) implemented the synchronous method for monospaced text.

Later commercial systems implemented a form of monospaced WYSIWYG but bound a proprietary binary file format tightly to the product: the Wang wordprocessor (1975), Microsoft Word (1983), and IBM's DisplayWriter (1980) and Displaywrite (1984); and many others following in later years. In these cases the markup remained proprietary and hidden, and only the formatting could be seen.

Hybrid systems were also developed, notably the versions of Wordperfect and its successors which allowed markup to be displayed in synchrony with the cursor movement through the text, a feature much praised by professional users. Typesetting systems also followed a similar trend, and many still kept a similar interface available until recently, for example 3B2 (now Arbortext's Advanced Print Publisher), Pandora (Elsevier's now-obsolete private compositors' version of Unix WordPerfect), and the original Miles 33 system editor (now OASYS).

The synchronous typographic model led to some other important changes which affected the development of most subsequent wordprocessor systems: out-of-line (standoff) markup, binary (and proprietary) file formats, and tokenised markup. Along with some technical issues relating to the disparity between screen and print resolution, the most obvious omission at the time was the absence of consistency control and the resulting restriction on markup to visual effects only. Stylesheets for providing consistency control have been available in all major wordprocessors for many years, but most users remain unaware of their existence [Campbell] or of how to use them [Calderwood1996].

1.2.1  Obscurity of hidden markup

The widespread use of synchronous typographic wordprocessors has obscured what users mean when they ask for WYSIWYG in an XML or LATEX editor. If we take for granted the three touchstones of visual editing, typographic formatting, and hidden markup, this still leaves some questions unanswered when applied to structural or descriptive markup systems, for example:

Professional use of markup requires a great deal more control and intervention than could be achieved in early graphical interfaces. As a result, many XML and LATEX users retain their preference for a plaintext interface with clearly visible markup, similar to the editing view used by Wikis. The much wider reach of XML systems — to users accustomed to wordprocessors — has brought a much larger demand for WYSIWYG systems, as we will see in Sect2 2.3. While there were significant early systems in XML and LATEX editors which implemented the synchronous typographic interface — notably the Arbortext Editor (1989) and Blue Sky's Textures (1989) — their relatively greater price and platform-dependence (Sun and Mac respectively) meant a much smaller adoption rate than is now seen. The ‘box-model’ editors InContext and STiLO also implemented some typographic formatting within the nested boxes, but this model highlights the difficulty of using it to edit deeply nested mixed content while preserving the readability (flow) of the text.

2  Perceptions and requirements

The concept of a ‘document model’ has been used in document engineering and related areas for many years. It has become more recently popular as a result of the growing use of XML, where there has been a more widespread need for a formal framework on which to build document architectures. The canonical definition is implied (although the term ‘document model’ is not used) in the W3C's ‘Infoset’ [2001], an abstract set of definitions of textual and other objects which forms — among other things — the Document Object Model (DOM) for XML.

Concrete definitions vary across different fields: [Close2003] writes of ‘‘the parse tree that results from parsing an encoded representation of a document’’; [Sosnoski2001] refers explicitly to its use in a specific language (in this case Java): ‘‘a library and API that supports working with a document representation’’; and [Brugger], writing rather earlier, speak of a ‘‘description of the structuring rules of a document class’’.

However, the formal document model is an artifact of computing science, an explicit structure essential for explaining and modelling the document, but opaque to the author of a document unless she is also a markup expert. Authors do have a model of their document, but it is internal, and can be anything from the conceptual structure of a novel to the outline of a business report to a complete chapter/section/subsection structure agreed with the publisher before signing the contract (an external example). While the two models may essentially carry the same information, they are used for different purposes. The external (CS) model is used for developing computing applications for structured document systems; the internal (authorial) model is used for developing the thought processes which guide the writing of the document. It is thus a canonical error to assume that the general author is aware of a document model in the same way that a document engineer is. This leads to a discrepancy between the view taken by the document-aware software expert about the software available or suitable for a task, and the view (and expectations) of the author.

2.1  Expert survey

In order to investigate this further, it was necessary to gather baseline data on the professional recommendation of editing systems. A survey was administered to a group of XML and LATEX experts who had extensive experience of editing software, both as software users and as systems designers or consultants.

The objective was to determine their views on the adequacy or otherwise of the editing software they had used, with specific reference to deficiencies they had encountered, special features they found important, and their expectations of the software. A secondary objective was to refine the questions for use in a later survey of ordinary users (section F.1).

A pilot survey of 12 experts was carried out in July 2003, and the results used to make minor adjustments to the phrasing of the questions in an attempt to eliminate bias. The main survey was conducted in July 2004 (an additional 20 subjects).

The questions asked were (in summary):

No personal details were recorded, in order to encourage explicit responses. A copy of the fully-worded questionnaire is available from the author and will be published in later research.

The principal findings showed that they tended to recommend software which was familiar to the users, rather than necessarily that which was best suited to the tasks, as this approach reduced the training requirements. A different set of editors was preferred by experts for their own use.

None of the experts interviewed identified any specific major feature of any editor which made it preferable to its competitors (there were a few minor preferences, but nothing relating to the core activities of editing XML), but interviewees identified a number of major deficiencies, including interface clutter, crashing or hanging on large files, lack of support for external entities and for catalogs, general instability, and poor typographic control.

These findings are listed in more detail in Sect2 4.1.

The editors used or recommended were generally regarded as ‘‘the best of a bad bunch’’. This dissatisfied view must be taken in consideration of the status of the subjects as extensively experienced in markup systems and thus liable to be critical far beyond the scope of the average user.

2.2  User perceptions

As mentioned earlier, the concept of a document model is already familiar to experienced users of systems like LATEX and SGML who have been accustomed to dealing with the features of document structures. In some cases the origins of the model go as far back as systems like GML or Scribe and beyond [Furuta1998]. In these cases, however, emphasis is often placed on the flexibility of the model or models [Raman1994], allowing individual users to extend or diminish the features where they feel the features are lacking or excessive [Flynn1999].

As has been described in Sect2 1.2, the much larger number of word-processing and desktop publishing users which now form our target ‘‘author’’ group became more accustomed to an unstructured document model [Salminen] because the WYSIWYG model convincingly substitutes a visual model for a structural one. Outside the publishing field, it is probably fair to say that most authors are not only unaware of software solutions to their requirements, but are unaware that they may in fact be in any kind of difficulty. Some of these beliefs can be explained by existing theories of cognition, and future work (see section F.1) will use these results to collect correlation data from ‘user’ and ‘beginner’ groups.

This ‘naïve’ view of the document has become so prevalent that wordprocessor users will now often conflate style and structure, believing that the appearance of the document is the structure. At least one conference organizer now refers to a document style sheet as a ‘‘document model’’ [Pillot1999], and several companies provide software to assist in extraction of information from unstructured sources and endowing it with an inferred structure [2003].

2.3  User requirements investigation

The attractiveness of the WYSIWYG model which has led to this position (‘user-seductiveness’, to use a marketing phrase) is based on a perception of ease of use, intuitiveness, obedience, and graphical appeal (colour, images, typefaces). With a large number of potential users already accustomed to this model, it is generally perceived now to be in demand for the editing of SGML/XML and LATEX by non-experts.

To obtain an initial picture of what users were actually asking for, an analysis of messages to the XML-L mailing list and the Usenet newsgroups comp.text.xml, comp.text.sgml and comp.text.tex was carried out. The original post in all threads mentioning editors was isolated (sampling details are in Appendix ) and the requirements categorised. The numbers of posts analysed in this way is shown in Figure 1.

Figure 1. Requests for information about WYSIWYG editors posted to XML-L, comp.text.sgml, comp.text.xml, and comp.text.tex

The steep rise in demand in 1997–1998 follows the release of XML 1.0 (1996); however, the troughs and peaks in 2000–2002 and 2003–2005 are less easily explained.

The frequency of messages posted appears to indicate that the target population for editing software may be changing: the number of posts requesting information peaked between 1999 and 2002. However, at the time of writing, data for 2006 was less than half available, and if we were to make a straight-line estimate of the total, the 2006 figure would reach 2000/2001 levels again. There are many possible reasons for this multimodal pattern: the gradual move of the population up the learning curve; the more widespread accessibility of information (on the Web) about available software; or even a certain resignedness that what the poster wanted simply is not achievable so whatever is available will have to do. More information about these factors will be collected in a later stage of this research.

For each original post, the key request parameters were isolated and categorised. As is conventional in investigations of this sort, the results showed the familiar negative exponential curve, with a small number of categories with a high frequency, and a very long tail of categories with very low frequencies (see Table 2.3).

Table 1. Specific concerns mentioned by users posting requests for editors (N=419) (see Figure 2).

Among the other features requested were (in alphabetical order):

  • Arbitrary DTDs
  • Attribute control
  • Automated formatting
  • Conditional text (effectivities)
  • Customisable interface (scripting)
  • Cut'n'paste from other applications without damaging the markup
  • Element-level locking
  • Font control
  • Hypertext links
  • Idiot-proofing
  • Linebreaking / word-wrap
  • Macros
  • Non-WYSIWYG ‘‘unacceptable’’
  • PostScript output/
  • Removal of need to understand markup
  • Spacing control (formatting)
  • Spell-checker / thesaurus/
  • Style files
  • Table editing/
  • Typeset quality printing from within the editor
Feature[s] Mentions
Cost-free / Open Source 141
WYSIWYG 106
Ease of use 78
Simplicity 48
Ability to include images 45
In-editor validation 39
Structure control 33
Tree-view display available 32
Equation editing 27
User-friendly 16
Hidden markup 10
Context-sensitive pop-up markup 10
Intuitive 8
Outlining (zoom) 8
Single-DTD (bespoke) 7
Legacy import/conversion 7
Unicode 6
DTD-less editing 5

While it is clear that WYSIWYG is a major component of requests, the overall data is inconclusive as to what users expect of ‘WYSIWYG’. Perhaps unsurprisingly, of the three features of WYSIWYG identified in Sect2 1.2 (real-time display, typographic formatting, and hidden markup) only hidden markup features in the list, as the others may well be taken for granted.

However, if we restrict the analysis to just those posts referring specifically to WYSIWYG, a different pattern emerges (Table 2.3). If we temporarily exclude equation editing (a domain-specific concern of many LATEX users), and the cost or Open Source factor (which is outside the domain of enquiry), the list is now headed by ease of use, the ability to handle images, and simplicity. An ‘intuitive’ interface1 and the ability to keep markup hidden still rank lower than the need for structural control, user-friendliness, validation, and a tree-view of the document.

Table 2. Specific concerns mentioned by users posting requests for WYSIWYG editors (N=106) (see Figure 3).
Feature Mentions
Ease of use 37
Equation editing 33
Cost-free / Open Source 30
Ability to include images 25
Simplicity 23
Structure control 17
User-friendly 11
In-editor validation 7
Tree-view display available 7
Hidden markup 6
Intuitive 3

In analysing the requests, there was a noticeable mismatch between what the user was asking for (eg ‘‘a WYSIWYG editor’’) and what is known to be available. The extreme limit of this is the demand for an editor which will ‘‘just let users type a document’’, without any knowledge of XML or structure or markup, and the editor which will ‘‘automatically add all the relevant markup by itself’’.

Figure 2. Specific requests for editorsSpecific requests for editors

N=419

This is an interesting, if degenerate, example of the assertion by [Tognazzini1996] that ‘‘intuitive’’ does not mean ‘‘able[…]to perceive the patterns of the user's behaviour and draw inferences’’, when in this case it quite clearly does mean precisely that. We will discuss the possibilities of intuiting markup in Appendix .

Figure 3. Specific requests for WYSIWYG editorsSpecific requests for WYSIWYG editors

N=106

We must assume, therefore, that the demand for a WYSIWYG interface to structured editing not only has to satisfy the primary requirement whereby ‘‘content during editing appears very similar to the final product’’, but that it must also satisfy additional criteria as instanced in Table 2.3.

In some cases this is technologically challenging: we cited earlier the problem of how to position the cursor for element insertion between contiguous nested end-tags when the markup is not visible.2 At the other extreme, some of the features requested have existed as standard in all SGML and XML editors since the earliest days (eg structure control, which is a sine qua non) and novice ignorance of this may be excused and tackled by better training and dissemination of information.

There is a summary of the principal requirements deduced from this analysis in Sect2 4.2.

3  Software analysis

In order to measure the facilities provided by editors capable of handling structured documents, we originally selected twenty-five applications for analysis. Because the speed of development in the field remains very high, a number of the systems selected ceased to be available and had to be replaced by more recent ones. This process is ongoing in the research and a later version of this paper will include some more recent changes.

3.1  Methodology

Because of the very large amount of software available in the field, we restricted ourselves to a sample of programs which represents three categories of software: these exhibit the principal features listed Sect2 1.1 and cover the types of markup we are examining.

  1. SGML and XML editors, excluding those designed for HTML only (or restricted to specific DTDs)

  2. Editors used for typesetting structured material, including both synchronously and asynchronously rendered typographical systems;

  3. Word-processors and desktop publishing (DTP) systems with significant structural features.

The emphasis on these specific categories has been based on two requirements; a) software which is demonstrably designed specifically for handling structured documents; b) software which had its origins in handling an unstructured model but which now has strong evidence of the ability to handle structure. For this reason, there is a clear emphasis on XML software, as this is the prevalent model of a structured-document system. Some SGML capability is included, as this is still in widespread use. HTML systems, despite their SGML roots, are excluded, as W3C HTML does not readily provide an identifiably robust structure to the document (unlike ISO-HTML), and because properly conformant HTML systems (not XHTML systems) are virtually non-existent on the web.

LATEX systems are included because the language implements the structural features discussed in Sect2 1.1, and these are generally adhered to in the software, although the syntax of the language allows the deliberate breaking of some parts of the model in order to achieve the primary objective, which is to set type. Other comparable products are admitted to the category because of similar features or because they support the editing of XML

Most wordprocessing, editing, and Desktop Publishing (DTP) systems were excluded because they use an unstructured, often dimensionless, model of the document and have no facilities for adapting to a planned document structure. As a consequence they also tend to lack suitable hierarchical, navigational, and manipulative features, as well as the consistency required to automate rendering and styling. Those which are examined here have specific features which may be compared with the more traditional structured solutions outlined above.

3.1.1  Interface classification

As we have seen, editing software is often simply classified as ‘plaintext’ or ‘WYSIWYG’. These terms in fact conflate at least three separate axes: display, markup, and control. Arguably, a fourth axis, output, should be included: although it is by definition presupposed to be 100% congruent with display in the WYSIWYG model (a target rarely achieved in practice), it is a variable feast in non-WYSIWYG editors. The canonical features of these axes are shown in Table 3.1.1. Note that some of them are not infinitely adjustable variables but dichotomous or polychotomous (step-valued) because different editors implement different features on each axis.

It is important to note, however:

  1. Plaintext regularisation (also called ‘wrapping’, ‘folding’, or ‘flowing’) performs a simple character-count optimisation for the line-length of the viewport, without hyphenation or justification. This is normally done for ease of editing, because it is presumed that any final-form typesetting will handle redundant white-space.

  2. Strictly speaking, a plaintext editor with regularisation and syntactic colourisation could still be termed WYSIWYG, as defined in Sect2 1.2 because the printout is identical to the display viewport. More so, when equipped with a suitable API or IDE, such an editor can produce typographically-formatted printout via a stylesheet and processor, producing asynchronously what WYSIWYG editors do synchronously. Given sufficient speed, a semi-continuous redisplay of the formatted output can act as a WYSIWYG monitor.

  3. The distinction is usually that markup in WYSIWYG and hybrid modes is displayed as graphical tokens which are inviolate to direct editing, whereas plaintext markup is shown in the text font and its characters can be edited directly.

Table 3. Axes for plaintext and WYSIWYG editors (notes refer to the list at the start of Sect3 3.1.1)
Axis Plaintext Hybrid WYSIWYG
Display Text and markup in monospace font, optionally regularised and syntactically coloured Monospace or proportional text, regularised, with tokenised markup Text formatted typographically to stylesheet
Markup Shows all markup, possibly an option to hide it Shows all markup, optionally hiding attributes and tags separately Hides the markup, with an option to show attributes and tags separately
Control Markup is edited directly or by menu or keystroke Markup is edited in panes, pop-ups, or menus Markup is edited in panes, pop-ups, or menus
Output Text and markup printed in monospace font Text is printed monospace, markup as graphical tokens Document is printed in typeset format, optionally with graphical markup
3.1.2  Selection criteria

The selection of software for analysis was based on several criteria:

Conspicuity

the program had to be well-known (widely-advertised, widely-used, or widely-discussed): obscure or experimental software was not considered;

Accessibility

it had to be easily accessible (available for purchase or download): this eliminated numerous vertical markets and specialist systems such as military software;

Applicability

the program had to be generally applicable in an authoring environment (business, research, academia, literature, etc): this eliminated further specialist systems;

Execution

it had to run in one or more supported operating environments: Java, a Unix-based operating system (including Linux, Solaris, and Apple OS X), or Microsoft Windows;

Orientation

the program had to be oriented toward the creation and maintenance of text documents (i.e., an editor; an exception was made in one case for a spreadsheet because of its widespread [ab]use even though it was not designed for creating XML);

Functional

it had to pass an ‘entry test’ of basic XML functionality independent of any considerations of the interface itself (see Table 3.1.2);

Table 4. Non-interface functionality test

Software was required to fulfill these conditions for the cases checked.

Test XML / SGML Typo­graphic WP / DTP
Use an external template or stylesheet
Recognize native file types or formats
Parse for syntax violations
Store documents in an open file format

The objective of the entry test was to exclude any software which did not show itself prima facie as being capable of handling of the file formats in question.

XML and SGML editors were expected and required to use the public text format. In the case of XML this is constrained by the XML Specification [Bray2000]: in the case of SGML we restricted the format to the Reference Concrete Syntax [1986] for compatibility with XML. Several systems additionally use internal (sometimes undocumented) binary formats for speed but these were not examined.

Similarly, typesetting editors which work with structured text may use public or proprietary formats. Five programs selected use LATEX syntax or a close variant; the others use proprietary but documented formats common in the industry, or can export to such formats.

Word-processors and DTP systems traditionally use proprietary binary formats to protect their markets, and conversion to other formats is not always reliable. One system (Nota Bene) uses a proprietary but accessible plaintext format; another (OpenOffice) saves as XML natively, using Open Document files zipped with a stylesheet; a third (Microsoft Word) will shortly also save in XML by default, using its own (WordML) schema.

3.1.3  Software selected

The programs selected were:

  1. SGML/XML:

    • Emacs with psgml-mode (GNU)

    • epcEdit (EPC GbR)

    • Epic (ArborText), now Arbortext Editor

    • Exchanger (Cladonia)

    • FrameMaker+XML (Adobe)

    • Office 11 (2003) [Word, Excel, InfoPath] (Microsoft)

    • WordPerfect 12 XML (Corel)

    • XML Spy and Authentic (Altova)

    • XMetaL (JustSystems)

  2. Typesetter editors

    • 3B2 (Advent, now Arbortext)

    • WinEDT (Aleksander Simonics)

    • Lyx (lyx.org)

    • Scientific Word (Mackichan)

    • Textures (Blue Sky)

    • TEXnicCenter (Sven Wiegand)

    • XPress (Quark)

  3. Wordprocessors/DTP

    • Nota Bene (Nota Bene Associates)

    • Office 11 [Word: non-XML] (Microsoft)

    • AbiWord (OSS)

    • OpenOffice] (OpenOffice.org)

    • WordPerfect non-XML (Corel)

In the case of programs which have a different primary function (eg XPress is a typesetter; Excel is a spreadsheet), only the built-in editor functions relating to structured document editing were exercised.

3.2  Function categorisation

The functions exhibited by the program interfaces were categorized as shown below. In some cases this was only possible after careful disambiguation, as some manufacturers' use of terminology conflicted with the established usage. The initial categorization was obtained by inspection, and resulted in the division of functions into four classes:

File handling

actions which operate on the file as a unit, including opening, closing/saving, parsing and validating, printing, and managing ancillary files such as templates or stylesheets;

Document structuring

organizing and arranging document structure, including markup insertion, change, and removal; and other edit operations which operate on parts of the file identified by markup, such as context-sensitive cut-and-paste;

Ergonomic facilities

making things easier for the user, including editor stylesheeting, sizing, coloring, and the customization of dialogs, menus, toolbars, and context-sensitive searching;

Editor management

metadocument functions, including entity management, character sets and encoding, the management of macros, plugins, and other utilities;

It is important to note that we excluded any conventional features operating on unmarked text only, or without respect to the markup. Many of these are common to all text systems everywhere, such as cut, copy, and paste, and they cannot be used to distinguish between structured-text systems unless they exhibit some sensitivity to markup. Similarly, some file-handling operations are also excluded (those which are not associated with markup activity, such as directory listing or display, recent file lists, etc).

Testing was conducted using functions found in the menus, keystrokes, or toolbars of the software (hidden features and those requiring specialist access were not included). For each program, the specified functionality was identified and exercised, where relevant using a simple test file designed to provide the conditions necessary for the test but without containing anything which might require special facilities (in fact an early draft of the introductory section of this paper).

3.2.1  File handling

In the case of XML systems the default format used for testing was the DocBook  DTD; for LATEX systems it was the article document class; for others an empty document template was used. The functions tested were:

  1. creating a new document from scratch according to a selected template;

  2. opening an existing document;

  3. closing a new document or modified old document;

  4. saving a document;

  5. printing a document to a printer;

  6. ‘printing’ a document to a file;

  7. document validation.

3.2.2  Document structuring

This is by definition the most extensive section. XML terminology has been used to describe the functions as it is the most widely understood.

For application to typesetting and word-processing systems, most of the functions have to be condensed to the level of the lowest common denominator, which is the simple distinction between character markup and paragraph markup, where ‘elements’ may be distinguished by name, but no hierarchy or content model exists (for example, Word's Named Styles). Style variants may be considered as broadly equivalent to attributes.

In the case of LATEX, where a clear hierarchy or content model exists or can be inferred, the equivalences in Table 3.2.2 were applied.

Table 5. Markup handling equivalences
LATEX XML
environment element with element content
control sequence (single argument) element with PCDATA content
second or optional arguments attributes
text-replacement macro general entity reference
verbatim environment CDATA marked section
special control sequence processing instruction

These are unquestionably simplistic and sometimes inaccurate in technical detail, but we are concerned at this stage with classifying the nature of the effect provided by a function (eg ‘create a list’, ‘start a new section’) rather than with the details of the operation of the interface (eg which menu or what type of widget is used to obtain the effect).

  1. insert element

  2. surround highlighted text with element markup

  3. rename existing element

  4. split existing element

  5. combine element with following or preceding element of the same type

  6. remove element markup, leave text

  7. delete element and its content

  8. add attribute

  9. edit attribute

  10. remove attribute

  11. check integrity of cross-references

  12. insert entity reference

  13. insert comment

  14. insert marked section

  15. insert processing instruction

  16. create new table

  17. delete whole table

  18. (12 functions) conventional table operations: insert, delete, merge, and split rows and columns, and edit table, row, column, and cell properties

  19. edit in plain-text mode (allow trespass on the markup)

3.2.3  Ergonomic functions

These make editing easier or more accurate. Additional ‘comfort factors’ such as edit colour preferences and tag font sizes (often subsumed under ‘preferences’) are excluded here as the number of individual settings possible is too large to make them all separate functions.

  1. search within specified markup

  2. spell-check by element

  3. validate by element

  4. create new stylesheet

  5. merge with existing stylesheet

  6. switch stylesheet

  7. save stylesheet

  8. edit stylesheet

  9. white-space handling functions (eg suppression or retention of significant and insignificant white-space, eg normalization and re-flow of the paragraph);

  10. white-space display functions (eg show explicit spaces by displaying a symbol)

  11. reveal/hide element markup

  12. reveal/hide attributes

  13. reveal/hide entities

  14. display document tree pane

  15. display element selection pane

  16. display attribute selection pane

  17. switch to browser or print-formatted editing or preview

  18. help

3.2.4  Editor management

Some of these refer to specific markup features, but are included here because their effects are document-wide.

  1. create new entity declaration

  2. edit entity declaration

  3. add/edit/delete notation declaration

  4. assign and deassign external processor for notations

  5. edit system metadata (not in markup)

  6. establish or change character encoding

  7. register or de-register plugin

  8. add and delete table-editor equivalence entry (allows previously unrecognized structures to be edited as tabular data)

  9. add/edit/delete script or macro

3.2.5  Recording and analysis

The presence or absence of each function was identified and recorded in a spreadsheet and the results are tabulated in in Sect2 4.3. Each function was exercised, using the minimally valid sample document referred to.

Given the very large number of data points generated by this procedure, it was decided to use exception reporting rather than conformance reporting. Thus where a specific feature is not commented on for a product, it may be assumed that the feature is present, and operated in the expected manner. The nature of this expectation is the subject of further study in this research.

4  Summary of results

4.1  Summary of expert survey

In general, this survey showed a depressing lack of enthusiasm for editing software. The group is by its nature highly critical and well-informed, and in some cases evidently more expert in handling structured text than the manufacturers of the software: a common criticism was that the vendors appeared to be unaware of the requirements of a structured text editor.

4.2  Summary of user requirements

Requests from users included both WYSIWYG and plain-text editing:

Requests when restricted to WYSIWYG enquiries only:

‘‘Intuitive’’ is ambiguous (does the interface guess the user's requirements; or does the user guess what the interface means?)

4.3  Summary of software evaluation

The classification of functions and features was designed to identify which of them — if any — were sufficiently common across a range of interface implementations to be taken as forming the core functionality of structured-document editing. Remaining (ie non-core) functions could be further analyzed to see if they were specific to certain types of document, certain types of user, certain modes of editing (eg tables), or to the requirements of certain areas of use (specific industries or applications).

Table 6. Identification of functions by product
Function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
FILE HANDLING

Key to Products

  1. Emacs
  2. epcEdit
  3. Epic
  4. Exchanger
  5. Frame
  6. Word-11
  7. Excel-11
  8. InfoPath
  9. WPxml
  10. XMLSpy
  11. Authentic
  12. XMetaL
  13. 3B2
  14. Emacs
  15. LyX
  16. SciWord
  17. Textures
  18. WinEDT
  19. XPress
  20. NotaBene
  21. Word
  22. AbiWord
  23. Publisher
  24. OpenOffice
  25. WP
new file x x x x x x x x x x x x x x x x x x x x x x x
open file x x x x x x x x x x x x x x x x x x x x x x x
close file x x x x x x x x x x x x x x x x x x x x x x x
save file x x x x x x x x x x x x x x x x x x x x x x x
print preview x x x x x x x x x x x x x x x x x x x x
print to file x x x x x x x x x x x x x x x x
validate document x x x x x x x x x x x x
XML EDITING
insert element x x x x x x x x x x x x x x x x x
tag marked x x x x x x x x x x x x x x x x x x x x x
rename element x x x x x x x x x x x
split element x x x x x x x x x x
combine element x x x x x
remove markup x x x x x x x x x x x x x x x x x x x
delete element x x x x x x x x x x x x x x x x x x x x
insert attribute x x x x x x x x x
edit attribute x x x x x x x x x x
remove attribute x x x x x x x x x
check ID/IDREF x x x
insert entref x x x x x x x
insert comment x x x x x x x x x x x x
insert MS x x x x x x x
insert PI x x x x x x
validate element x x x x x x x x
raw-text edit x x x x x x x x x x x x x x x x
TABLES
create table x x x x x x x x x x x x x x x x x x x x
delete table x x x x x x x x x x x x x x x x x x x x
insert row x x x x x x x x x x x x x x x x x x x x
delete row x x x x x x x x x x x x x x x x x x x x
merge row x x x x x x x x x x x x x x x x x
split row x x x x x x x x x x x x x x x x x
insert col x x x x x x x x x x x x x x x x x x x x
delete col x x x x x x x x x x x x x x x x x x x x
merge col x x x x x x x x x x x x x x x x
split col x x x x x x x x x x x x x x x x x
edit table props x x x x x x x x x x x x x x x x x
edit row props x x x x x x x x x x x x x x x x x
edit col props x x x x x x x x x x x x x x x x x
edit cell props x x x x x x x x x x x x x x x x
ERGONOMICS
search in markup x x x x x x x x
spell-check element x x x x x x x x x x
create stylesheet x x x x x x x x x x x x x x x x x x x
merge stylesheet x x x x x x x x x x
switch stylesheet x x x x x x x x x x x x x x
save stylesheet x x x x x x x x x x x x x x x x x
edit stylesheet x x x x x x x x x x x x x x x
WS handling x x x x x
WS display x x x x
reveal/hide tags x x x x x x x x x x x x
reveal/hide attributes x x x x x x x x
reveal/hide entities x x x x x x
tree pane x x x x x x x x x x x x x x
element pane x x x x x x x x x x x
attribute pane x x x x x x x x x
toggle browse/print x x x x x x x x x x x x x
help x x x x x x x x x x x x x x x x x x x x x x
EDITOR MANAGEMENT
create entity x x x x x
edit entity x x x x x x
notations x x x x x x
assign processor x x x x x x
edit metadata x x x x x x x x x x x x x
set character encoding x x x x x x x x x x
register plugin x x x x
table equivalence x x x
macros/scripts x x x x x x x x x x x x x x

5  Conclusions

All the XML editors examined possessed the same core editing features, with a small number of exceptions (for example Emacs/psgml has a ‘‘split element’’ command, but no ‘‘join element’’ command). To some extent the presence of these features is implicit in SGML and XML, if not exactly mandated: to be able to insert an element, you must have a control which allows you to do it.

The differences lie therefore in the placement and naming of the keys and menus in the interface. In the absence of further information, it must be assumed that the designers and marketers of the editors came to certain conclusions about what the user needed or wanted, and that their products reflect this perception.

All the editors had fairly comprehensive tables editing controls, either for the HTML table model or the CALS table model. The more advanced systems and those with a strong SGML document heritage (eg Epic, XMetaL) can do both, and more if programmed (eg the SASOUT table model). Emacs has a good plaintext table editor in the table.el module, which can produce LATEX or HTML table markup.

The widest variations were in the ‘‘ergonomic’’ and ‘‘editor management’’ features. While some of these are ‘‘comfort features’’ added to smooth the author's ride, some of them are critical to the operation of an editor for structured text (eg entity management), and their omission can only be seen as an admission by the manufacturer that their product is not suitable for authorial use.

No single editor examined can be said to be suitable for the non-expert in XML or LATEX. A significant understanding of markup theory, and of the specific markup for the user's application, would be needed before an author outside the XML/LATEX field could even begin using these programs. The extent of this training, and steps which might be taken to remedy the position, are the subject of further work. In the meantime, what might be termed the ‘‘semi-structured’’ interfaces of wordprocessors do almost as good a job for the author (although clearly not for the publisher), despite their obvious shortcomings.

Some facilities (including some of those mentioned by the experts surveyed earlier) are entirely missing in most editors unless programmed in with scripts of macros. Unlike wordprocessors and DTP systems, which generally work straight out of the box, XML editors usually require extensive customisation before they can be used for a specific application. This is slightly less true of LATEX, as the current default installations of popular distributions (eg TEX Live, MikTEX) include either a large selection of packages (plug-ins) or a transparent method for adding them from the network as and when needed. For XML editing, the deficiencies noted were:

A better understanding is still required of what the users want, expect, and need from interfaces to structured documents. There has been a lot of work on interaction design at a lower level (eg operating systems interfaces), and this needs to be extended to the field of structured documents. While there can be little substitute at a professional level for careful training in the use of structured information, the increasing demand for systems which can be used by the untrained operator cannot commercially be ignored. An analysis of the expectations may reveal whether or not the use of structured text systems can be made easier or more effective without sacrificing accuracy and timeliness.


A  Remaining work

The research for this project is ongoing at the time of writing (early 2006). The workplan currently includes:

  1. a second (user) survey (below);

  2. the mapping of the results to the feature matrix;

  3. the derivation of any changes to the prevailing interface paradigms;

  4. the testing of possible prototype interface changes.

Among the candidates for evaluation are a number of methods of intuiting the user's requirements. Although some methods have been in use for many years, full use does not appear to have been made of user-driven changes to the visual interface in detecting activity which could support an interpretation in markup.

A.1  User survey

The target population for this survey contains the users who have some prior experience of work with structured documents; that is, they have used XML, LATEX, or another system of structured markup, with one or more of the selected products or suitable equivalent.

The objective is to gather their reactions to the software they used, why it was or was not suited to specific tasks, and what features or deficiencies they found. The structure and wording was informed by the work on the expert survey (Sect2 2.1).

B  Mailing list and Usenet sample

To try and identify users' requests for editors and editing features, a technique was developed to retrieve the original posts which started any thread containing certain keywords, and then check that the original post also contained the same words. This rather roundabout technique was necessary because of the way in which the primary accessible archive for Usenet newsgroups (Google Groups) is accessed.

An initial search was carried out for the words ‘wysiwyg editor’, ‘structure editor’, ‘easy to use editor’, and many others.

This resulted in a large number of posts in common, in most cases over 50%.

To avoid the duplication of effort, the search was repeated for the key words separately, ‘editor’, ‘wysiwyg’, ‘structure’, etc.

For each post retrieved, the whole thread was accessed and the first post (the original post which started the thread) was isolated and the message text extracted.

These original messages were tested for the presence of at least one keyword, and the matches built into a mailbox file.

When all retrievals were finished, the duplicate posts were eliminated (using the Message-ID header value).

This resulted in 101 posts to comp.text.tex, 67 to comp.text.sgml, and 273 to comp.text.xml and XML-L together. These were plotted as a histogram by year of posting, as shown in Figure 1, and then read individually to identify the requested features as shown in Table 2.3.

The extraction and identification was performed by a simple shell script performing repeated calls to Google Groups using the wget utility. The resulting HTML page was regularised to XHTML by Tidy, and filtered by an XSLT script to identify the post matching the keyword (ignoring followups). This post was then retrieved separately by the same mechanism using another XSLT script, which resulted in the first ten posts to the thread. Finally, the top post was retrieved using the ‘Source’ switch, and a third XSLT script extracted the original message text.

I am indebted to Andy Arnt at Google for arranging permission to run scripted retrievals against their database.

References

  1. [1986] ISO JTC 1/SC 34; ISO Standards: Information processing — Text and office systems — Standard Generalized Markup Language (SGML). International Organization for Standardization, Geneva, 1986
  2. [1994] Burnard, Louis and Sperberg-McQueen, CM (Eds): Guidelines for the Encoding and Interchange of Machine-Readable Texts. Text Encoding Initiative, Oxford and Chicago, 1994. Link to online resource
  3. [2001] Cowan, John and Tobin, Richard (Eds): The XML Information Set. W3C, 2001, Cambridge, MA. Link to online resource
  4. [2003] Semantic Research, Inc: Semantic Reader. Semantic Research, Inc., San Diego, CA, 2003. Link to online resource
  5. [Biggs1997] Biggs, Michael and Huitfeldt, Claus: ‘Philosophy and Electronic Publishing: Theory and Metatheory in the Development of Text Encoding’ . In ‘The Monist’, 80:3, Buffalo, NY, 1997. Link to online resource
  6. [Bray2000] Bray, Tim; Paoli, Jean; Sperberg-McQueen, Michael and Maler, Eve: Extensible Markup Language Version 1.0. W3C, Cambridge, MA, 2000, 2nd Ed.
  7. [Brugger] Brugger, Rolf; Zramdini, Abdelwahab and Ingold, Rolf: ‘Modeling Documents for Structure Recognition Using Generalized N-Grams’ . In 4th International Conference on Document Analysis and Recognition (ICDAR'97), 1997, Ulm, Germany, pp.56–60. IEEE Computer Society, 0-8186-7898-4. Link to online resource
  8. [Calderwood1996] Calderwood, David: ‘An Internet Exercise in Conveyancing Practice’ . In ‘Web Journal of Current Legal Issues’, 5, 1996. Link to online resource
  9. [Campbell] Campbell, Eoin: ‘Word and YAWC: A Poor Mans' XML Publishing Environment’ . In XML Europe, 2002, Barcelona. Graphic Communications Association, Daingerfield, VA. Link to online resource
  10. [Close2003] Close, Tyler: Waterken™ Doc: Document Model Specification. Waterken Inc., 2003, The Valley, Anguilla. Link to online resource
  11. [Coombs1987] Coombs, JS; Renear, Allen and DeRose, SJ: ‘Markup Systems and the Future of Scholarly Text Processing’ . In ‘Communications of the ACM’. ACM Press, 30, New York, NY, pp.933–947, 1987. Link to online resource
  12. [Cournane1997] Cournane, Mavis: The application of SGML/TEI to the processing of complex multi-lingual text. University College Cork, Cork, Ireland, 1997. Link to online resource
  13. [DeRose1990] DeRose, SJ; Durand, David; Mylonas, Elli and Renear, Allen: ‘What is text, really?’ . In ‘Journal of Computing in Higher Education’, 1:2, Amherst, MA, pp.3-26, 1990, 1042-1726
  14. [Flynn] Flynn, Peter: ‘W[h]ither the Web? The extension or replacement of HTML’ . In SGML/XML'99, 1999. Graphic Communications Association, Daingerfield, VA
  15. [Flynn1999] Flynn, Peter: ‘The vulcan package: A repair patch for LATEX’ . In ‘TUGboat’, 20:3, 1999. TEX Users Group, Portland, , OR. Link to online resource
  16. [Furuta1998] Furuta, Richard; Quint, Vincent and André, Jacques: ‘Interactively editing structured documents: Combining the advantages of structured documents and WYSIWYG editing’ . In ‘Electronic Publishing — Origination, Dissemination, and Design’, 1:1, pp.19–44, 1988
  17. [Goldfarb1990] Goldfarb, Charles: The SGML Handbook. OUP, Oxford, England, 0-19-853737-9, 1990
  18. [Kastrup2002] Kastrup, David: ‘Revisiting WYSIWYG Paradigms for Authoring LATEX’ . In ‘TUGboat’. TEX Users Group, 23:1, Portland, OR, pp.57-64, 2002
  19. [Knuth1984] Knuth, Donald: The TEXbook. Addison-Wesley, Reading, MA, , 1984
  20. [Lamport1985] Lamport, Leslie: LATEX: A document preparation system. Addison-Wesley, Reading, MA, 0201529831, 1st Ed., 1985
  21. [Lamport1994] Lamport, Leslie: LATEX: A document preparation system. Addison-Wesley, Reading, MA, 0201529831, 2nd Ed., 1994
  22. [Lonnert1996] Lonnert, Set: Towards a new design of graphical interfaces. Set Lonnert Humanistics & Technology, Uppsala, Sweden, 1996. Link to online resource
  23. [Mamrak1988] Mamrak, SA; Barnes, J; Hong, H; Joseph, C; Kaelbling, M; Nicholas, C; O'Connell, C and Share, M: ‘Descriptive Markup: the best approach?’ . In ‘Communications of the ACM’. Association for Computing Machinery, 31:7, New York, NY, pp.810-811, 1988. Link to online resource
  24. [Pillot1999] Pillot, Patrice and Obrecht, André: Instructions to Authors: European Workshop on Content-Based Multimedia Indexing. GT-10 Multimedia Indexing working group GDR-PRC ISIS of the CNRS, Toulouse, France, 1999. Link to online resource
  25. [Quin1996] Quin, Liam: ‘Suggestive Markup: Explicit Relationships in Descriptive and Prescriptive DTDs’ . In SGML'96, 1996. Graphic Communications Association, Alexandria, VA, pp.405-418, 1996
  26. [Raman1994] Raman, TV: An audio system for technical readings. Cornell University, 1994. Link to online resource
  27. [Renear1996] Renear, Allen; Durand, David and Mylonas, Elli: ‘Refining our notion of what text really is: The problem of overlapping hierarchies’ . In ‘Research in Humanities Computing’Ide, Nancy and Hockey, Susan (Eds): . Oxford University Press, Oxford, England, 1996
  28. [Renear2002] Renear, Allen; Dubin, David and Sperberg-McQueen, CM: ‘Towards a semantics for XML markup’ . In ACM Symposium on Document Engineering, 2002. ACM Press, , New York, NY, pp.119-126, 2002, 1-58113-594-7. Link to online resource
  29. [Salminen] Salminen, Airi: ‘A relational model for unstructured documents’ . In Yu, CT and Van Rijsbergen, CJ (Eds): 10th annual international ACM SIGIR conference on research and development in information retrieval , 1987, pp.196–207. ACM Special Interest Group on Information Retrieval, New York, NY, 0-89791-232-2
  30. [Saltzer1964] Saltzer, JH: TYPSET and RUNOFF, Memorandum editor and type-out commands. MIT, Cambridge, MA, Nov 1964. Link to online resource
  31. [Sosnoski2001] Sosnoski, Dennis M: A look at features and performance of XML document models in Java. IBM, 2001, Armonk, NY. Link to online resource
  32. [Tognazzini1996] Tognazzini, Bruce: TOG on Interface. Addison-Wesley, Reading, MA, May 1996, 0-201-60842-1
This work forms part of research towards a PhD in software usability techniques at the Human Factors Research Group in the Department of Applied Psychology, university College, Cork, Ireland.