Text markup systems from their earliest origins in the 1950s and 60s were largely based on inline markup using plaintext characters. RUNOFF is a canonical historical example [Saltzer1964]: the most common modern equivalent is LATEX [Knuth1984]. SGML  and particularly XML [Bray2000] are more recent, and provide greater structural rigour and generic adaptability.
The features of plaintext inline markup systems are typically:
markup is intermingled with (directly adjacent to) the text it refers to (as opposed to ‘‘standoff’’ or ‘‘out-of-line’’ markup, which uses pointers);
plaintext uses displayable, printable, human-readable characters (as opposed to binary, which uses unprintable codes which are only readable by machine);
to distinguish between markup and text, one or more characters are designated as ‘special’, and act as signals or delimiters to show that what follows (or is contained) is markup, not text (or vice versa);
editors typically display both text and markup unformatted and undistinguished (although colour is sometimes used), and both can be edited character by character;
the markup is a mix of descriptive (‘‘this is a section heading’’) and procedural (‘‘display this in italics’’).
The SGML and XML systems in use today are based on similar principles, but with the following changes:
markup rigorously encloses the text it refers to (with the one exception of the EMPTY element for items incapable of containing text);
formal hierarchical nesting of markup (no overlap);
markup is intended to be generic, and user-definable;
a formal parse can be conducted to identify the markup components and to test their arrangement against a defined pattern;
markup can be used programmatically to affect formatting in the editor: it can be hidden or revealed, and protected against trespass, .
The inline plaintext model of markup has conditioned file formats for text systems as well as the products using them, from RUNOFF (1963) through SCRIPT (1967), PUB (1971), troff/nroff (1973), GML (1974), TEX (1978), LATEX (1984), and SGML (1986), to XML (1996). Typically, the more generic and programmable the markup becomes. the wider the choice of editors for a given format.
Similar models affected the design both of file formats and interfaces for WordPerfect (1980), Word (1983) and a large number of other wordprocessors such as PC-Write (1984) and XYWrite (1985) and their derivatives (eg Nota Bene). Originally, in almost all cases, markup was displayed on the screen along with the text, in the system (monospaced) typeface, but is now conventionally hidden to provide a ‘near-WYSIWYG’ view.
Before the widespread use of bitmapped graphics screens, bright and dim text variants were used to simulate bold and italics, and printers could overprint for bold and underline for italics. Graphical screens and multi-font printers replaced the older technology during the 1970s and 1980s, and colour became commonplace in the 1990s.
The two key early developments were the use of ‘real-time’ or ‘direct intervention’ editing (also called ‘visual’, or ‘synchronous’ editing), and the bitmapping of fonts to graphical displays. These were dependent on advances in technology. The third development — the hiding of markup — was a conceptual move, and facilitated the real-time typographic formatting which gives us the current WYSIWYG paradigm. The conventional definition of What You See Is What You Get (WYSIWYG) in fact combines all three developments to produce ‘‘a system in which content during editing appears very similar to the final product’’ (Wikipedia)
The application of synchronous editing to a typographically-formatted display came with Bravo (Xerox, 1974: the technique was nicknamed WYSIWYG later). In synchronous editing, the characters typed, erased, or replaced were visible during the operation, rather than after the event. Visual plaintext editors such as Emacs (Stallman, 1975) and vi (Joy, 1976) implemented the synchronous method for monospaced text.
Later commercial systems implemented a form of monospaced WYSIWYG but bound a proprietary binary file format tightly to the product: the Wang wordprocessor (1975), Microsoft Word (1983), and IBM's DisplayWriter (1980) and Displaywrite (1984); and many others following in later years. In these cases the markup remained proprietary and hidden, and only the formatting could be seen.
Hybrid systems were also developed, notably the versions of Wordperfect and its successors which allowed markup to be displayed in synchrony with the cursor movement through the text, a feature much praised by professional users. Typesetting systems also followed a similar trend, and many still kept a similar interface available until recently, for example 3B2 (now Arbortext's Advanced Print Publisher), Pandora (Elsevier's now-obsolete private compositors' version of Unix WordPerfect), and the original Miles 33 system editor (now OASYS).
The synchronous typographic model led to some other important changes which affected the development of most subsequent wordprocessor systems: out-of-line (standoff) markup, binary (and proprietary) file formats, and tokenised markup. Along with some technical issues relating to the disparity between screen and print resolution, the most obvious omission at the time was the absence of consistency control and the resulting restriction on markup to visual effects only. Stylesheets for providing consistency control have been available in all major wordprocessors for many years, but most users remain unaware of their existence [Campbell] or of how to use them [Calderwood1996].
The widespread use of synchronous typographic wordprocessors has obscured what users mean when they ask for WYSIWYG in an XML or LATEX editor. If we take for granted the three touchstones of visual editing, typographic formatting, and hidden markup, this still leaves some questions unanswered when applied to structural or descriptive markup systems, for example:
how can contiguous markup boundaries be made unambiguous if the cursor position will not change when they are traversed?
how should markup which bears no typographic distinction be made accessible for editing?
how should the author indicate a quality or identity which depends on correct markup?
how should an editing program react to an author's keystrokes?
Professional use of markup requires a great deal more control and intervention than could be achieved in early graphical interfaces. As a result, many XML and LATEX users retain their preference for a plaintext interface with clearly visible markup, similar to the editing view used by Wikis. The much wider reach of XML systems — to users accustomed to wordprocessors — has brought a much larger demand for WYSIWYG systems, as we will see in Sect2 2.3. While there were significant early systems in XML and LATEX editors which implemented the synchronous typographic interface — notably the Arbortext Editor (1989) and Blue Sky's Textures (1989) — their relatively greater price and platform-dependence (Sun and Mac respectively) meant a much smaller adoption rate than is now seen. The ‘box-model’ editors InContext and STiLO also implemented some typographic formatting within the nested boxes, but this model highlights the difficulty of using it to edit deeply nested mixed content while preserving the readability (flow) of the text.
The concept of a ‘document model’ has been used in document engineering and related areas for many years. It has become more recently popular as a result of the growing use of XML, where there has been a more widespread need for a formal framework on which to build document architectures. The canonical definition is implied (although the term ‘document model’ is not used) in the W3C's ‘Infoset’ , an abstract set of definitions of textual and other objects which forms — among other things — the Document Object Model (DOM) for XML.
Concrete definitions vary across different fields: [Close2003] writes of ‘‘the parse tree that results from parsing an encoded representation of a document’’; [Sosnoski2001] refers explicitly to its use in a specific language (in this case Java): ‘‘a library and API that supports working with a document representation’’; and [Brugger], writing rather earlier, speak of a ‘‘description of the structuring rules of a document class’’.
However, the formal document model is an artifact of computing science, an explicit structure essential for explaining and modelling the document, but opaque to the author of a document unless she is also a markup expert. Authors do have a model of their document, but it is internal, and can be anything from the conceptual structure of a novel to the outline of a business report to a complete chapter/section/subsection structure agreed with the publisher before signing the contract (an external example). While the two models may essentially carry the same information, they are used for different purposes. The external (CS) model is used for developing computing applications for structured document systems; the internal (authorial) model is used for developing the thought processes which guide the writing of the document. It is thus a canonical error to assume that the general author is aware of a document model in the same way that a document engineer is. This leads to a discrepancy between the view taken by the document-aware software expert about the software available or suitable for a task, and the view (and expectations) of the author.
In order to investigate this further, it was necessary to gather baseline data on the professional recommendation of editing systems. A survey was administered to a group of XML and LATEX experts who had extensive experience of editing software, both as software users and as systems designers or consultants.
The objective was to determine their views on the adequacy or otherwise of the editing software they had used, with specific reference to deficiencies they had encountered, special features they found important, and their expectations of the software. A secondary objective was to refine the questions for use in a later survey of ordinary users (section F.1).
A pilot survey of 12 experts was carried out in July 2003, and the results used to make minor adjustments to the phrasing of the questions in an attempt to eliminate bias. The main survey was conducted in July 2004 (an additional 20 subjects).
The questions asked were (in summary):
Background (occupation, computing experience);
Software selection criteria for a sample list of client tasks;
Preference for other products (if any);
Specially useful features of preferred software;
Particularly poor features of software used;
Features of software considered to be lacking;
Features of software which proved difficult or impossible to use.
No personal details were recorded, in order to encourage explicit responses. A copy of the fully-worded questionnaire is available from the author and will be published in later research.
The principal findings showed that they tended to recommend software which was familiar to the users, rather than necessarily that which was best suited to the tasks, as this approach reduced the training requirements. A different set of editors was preferred by experts for their own use.
None of the experts interviewed identified any specific major feature of any editor which made it preferable to its competitors (there were a few minor preferences, but nothing relating to the core activities of editing XML), but interviewees identified a number of major deficiencies, including interface clutter, crashing or hanging on large files, lack of support for external entities and for catalogs, general instability, and poor typographic control.
These findings are listed in more detail in Sect2 4.1.
The editors used or recommended were generally regarded as ‘‘the best of a bad bunch’’. This dissatisfied view must be taken in consideration of the status of the subjects as extensively experienced in markup systems and thus liable to be critical far beyond the scope of the average user.
As mentioned earlier, the concept of a document model is already familiar to experienced users of systems like LATEX and SGML who have been accustomed to dealing with the features of document structures. In some cases the origins of the model go as far back as systems like GML or Scribe and beyond [Furuta1998]. In these cases, however, emphasis is often placed on the flexibility of the model or models [Raman1994], allowing individual users to extend or diminish the features where they feel the features are lacking or excessive [Flynn1999].
As has been described in Sect2 1.2, the much larger number of word-processing and desktop publishing users which now form our target ‘‘author’’ group became more accustomed to an unstructured document model [Salminen] because the WYSIWYG model convincingly substitutes a visual model for a structural one. Outside the publishing field, it is probably fair to say that most authors are not only unaware of software solutions to their requirements, but are unaware that they may in fact be in any kind of difficulty. Some of these beliefs can be explained by existing theories of cognition, and future work (see section F.1) will use these results to collect correlation data from ‘user’ and ‘beginner’ groups.
This ‘naïve’ view of the document has become so prevalent that wordprocessor users will now often conflate style and structure, believing that the appearance of the document is the structure. At least one conference organizer now refers to a document style sheet as a ‘‘document model’’ [Pillot1999], and several companies provide software to assist in extraction of information from unstructured sources and endowing it with an inferred structure .
The attractiveness of the WYSIWYG model which has led to this position (‘user-seductiveness’, to use a marketing phrase) is based on a perception of ease of use, intuitiveness, obedience, and graphical appeal (colour, images, typefaces). With a large number of potential users already accustomed to this model, it is generally perceived now to be in demand for the editing of SGML/XML and LATEX by non-experts.
To obtain an initial picture of what users were actually asking for, an analysis of messages to the XML-L mailing list and the Usenet newsgroups comp.text.xml, comp.text.sgml and comp.text.tex was carried out. The original post in all threads mentioning editors was isolated (sampling details are in Appendix ) and the requirements categorised. The numbers of posts analysed in this way is shown in Figure 1.
The steep rise in demand in 1997–1998 follows the release of XML 1.0 (1996); however, the troughs and peaks in 2000–2002 and 2003–2005 are less easily explained.
The frequency of messages posted appears to indicate that the target population for editing software may be changing: the number of posts requesting information peaked between 1999 and 2002. However, at the time of writing, data for 2006 was less than half available, and if we were to make a straight-line estimate of the total, the 2006 figure would reach 2000/2001 levels again. There are many possible reasons for this multimodal pattern: the gradual move of the population up the learning curve; the more widespread accessibility of information (on the Web) about available software; or even a certain resignedness that what the poster wanted simply is not achievable so whatever is available will have to do. More information about these factors will be collected in a later stage of this research.
For each original post, the key request parameters were isolated and categorised. As is conventional in investigations of this sort, the results showed the familiar negative exponential curve, with a small number of categories with a high frequency, and a very long tail of categories with very low frequencies (see Table 2.3).
Among the other features requested were (in alphabetical order):
- Arbitrary DTDs
- Attribute control
- Automated formatting
- Conditional text (effectivities)
- Customisable interface (scripting)
- Cut'n'paste from other applications without damaging the markup
- Element-level locking
- Font control
- Hypertext links
- Linebreaking / word-wrap
- Non-WYSIWYG ‘‘unacceptable’’
- PostScript output/
- Removal of need to understand markup
- Spacing control (formatting)
- Spell-checker / thesaurus/
- Style files
- Table editing/
- Typeset quality printing from within the editor
|Cost-free / Open Source||141|
|Ease of use||78|
|Ability to include images||45|
|Tree-view display available||32|
|Context-sensitive pop-up markup||10|
While it is clear that WYSIWYG is a major component of requests, the overall data is inconclusive as to what users expect of ‘WYSIWYG’. Perhaps unsurprisingly, of the three features of WYSIWYG identified in Sect2 1.2 (real-time display, typographic formatting, and hidden markup) only hidden markup features in the list, as the others may well be taken for granted.
However, if we restrict the analysis to just those posts referring specifically to WYSIWYG, a different pattern emerges (Table 2.3). If we temporarily exclude equation editing (a domain-specific concern of many LATEX users), and the cost or Open Source factor (which is outside the domain of enquiry), the list is now headed by ease of use, the ability to handle images, and simplicity. An ‘intuitive’ interface1 and the ability to keep markup hidden still rank lower than the need for structural control, user-friendliness, validation, and a tree-view of the document.
|Ease of use||37|
|Cost-free / Open Source||30|
|Ability to include images||25|
|Tree-view display available||7|
In analysing the requests, there was a noticeable mismatch between what the user was asking for (eg ‘‘a WYSIWYG editor’’) and what is known to be available. The extreme limit of this is the demand for an editor which will ‘‘just let users type a document’’, without any knowledge of XML or structure or markup, and the editor which will ‘‘automatically add all the relevant markup by itself’’.
This is an interesting, if degenerate, example of the assertion by [Tognazzini1996] that ‘‘intuitive’’ does not mean ‘‘able[…]to perceive the patterns of the user's behaviour and draw inferences’’, when in this case it quite clearly does mean precisely that. We will discuss the possibilities of intuiting markup in Appendix .
We must assume, therefore, that the demand for a WYSIWYG interface to structured editing not only has to satisfy the primary requirement whereby ‘‘content during editing appears very similar to the final product’’, but that it must also satisfy additional criteria as instanced in Table 2.3.
In some cases this is technologically challenging: we cited earlier the problem of how to position the cursor for element insertion between contiguous nested end-tags when the markup is not visible.2 At the other extreme, some of the features requested have existed as standard in all SGML and XML editors since the earliest days (eg structure control, which is a sine qua non) and novice ignorance of this may be excused and tackled by better training and dissemination of information.
There is a summary of the principal requirements deduced from this analysis in Sect2 4.2.
In order to measure the facilities provided by editors capable of handling structured documents, we originally selected twenty-five applications for analysis. Because the speed of development in the field remains very high, a number of the systems selected ceased to be available and had to be replaced by more recent ones. This process is ongoing in the research and a later version of this paper will include some more recent changes.
Because of the very large amount of software available in the field, we restricted ourselves to a sample of programs which represents three categories of software: these exhibit the principal features listed Sect2 1.1 and cover the types of markup we are examining.
SGML and XML editors, excluding those designed for HTML only (or restricted to specific DTDs)
Editors used for typesetting structured material, including both synchronously and asynchronously rendered typographical systems;
Word-processors and desktop publishing (DTP) systems with significant structural features.
The emphasis on these specific categories has been based on two requirements; a) software which is demonstrably designed specifically for handling structured documents; b) software which had its origins in handling an unstructured model but which now has strong evidence of the ability to handle structure. For this reason, there is a clear emphasis on XML software, as this is the prevalent model of a structured-document system. Some SGML capability is included, as this is still in widespread use. HTML systems, despite their SGML roots, are excluded, as W3C HTML does not readily provide an identifiably robust structure to the document (unlike ISO-HTML), and because properly conformant HTML systems (not XHTML systems) are virtually non-existent on the web.
LATEX systems are included because the language implements the structural features discussed in Sect2 1.1, and these are generally adhered to in the software, although the syntax of the language allows the deliberate breaking of some parts of the model in order to achieve the primary objective, which is to set type. Other comparable products are admitted to the category because of similar features or because they support the editing of XML
Most wordprocessing, editing, and Desktop Publishing (DTP) systems were excluded because they use an unstructured, often dimensionless, model of the document and have no facilities for adapting to a planned document structure. As a consequence they also tend to lack suitable hierarchical, navigational, and manipulative features, as well as the consistency required to automate rendering and styling. Those which are examined here have specific features which may be compared with the more traditional structured solutions outlined above.
As we have seen, editing software is often simply classified as ‘plaintext’ or ‘WYSIWYG’. These terms in fact conflate at least three separate axes: display, markup, and control. Arguably, a fourth axis, output, should be included: although it is by definition presupposed to be 100% congruent with display in the WYSIWYG model (a target rarely achieved in practice), it is a variable feast in non-WYSIWYG editors. The canonical features of these axes are shown in Table 3.1.1. Note that some of them are not infinitely adjustable variables but dichotomous or polychotomous (step-valued) because different editors implement different features on each axis.
It is important to note, however:
Plaintext regularisation (also called ‘wrapping’, ‘folding’, or ‘flowing’) performs a simple character-count optimisation for the line-length of the viewport, without hyphenation or justification. This is normally done for ease of editing, because it is presumed that any final-form typesetting will handle redundant white-space.
Strictly speaking, a plaintext editor with regularisation and syntactic colourisation could still be termed WYSIWYG, as defined in Sect2 1.2 because the printout is identical to the display viewport. More so, when equipped with a suitable API or IDE, such an editor can produce typographically-formatted printout via a stylesheet and processor, producing asynchronously what WYSIWYG editors do synchronously. Given sufficient speed, a semi-continuous redisplay of the formatted output can act as a WYSIWYG monitor.
The distinction is usually that markup in WYSIWYG and hybrid modes is displayed as graphical tokens which are inviolate to direct editing, whereas plaintext markup is shown in the text font and its characters can be edited directly.
|Display||Text and markup in monospace font, optionally regularised and syntactically coloured||Monospace or proportional text, regularised, with tokenised markup||Text formatted typographically to stylesheet|
|Markup||Shows all markup, possibly an option to hide it||Shows all markup, optionally hiding attributes and tags separately||Hides the markup, with an option to show attributes and tags separately|
|Control||Markup is edited directly or by menu or keystroke||Markup is edited in panes, pop-ups, or menus||Markup is edited in panes, pop-ups, or menus|
|Output||Text and markup printed in monospace font||Text is printed monospace, markup as graphical tokens||Document is printed in typeset format, optionally with graphical markup|
The selection of software for analysis was based on several criteria:
the program had to be well-known (widely-advertised, widely-used, or widely-discussed): obscure or experimental software was not considered;
it had to be easily accessible (available for purchase or download): this eliminated numerous vertical markets and specialist systems such as military software;
the program had to be generally applicable in an authoring environment (business, research, academia, literature, etc): this eliminated further specialist systems;
it had to run in one or more supported operating environments: Java, a Unix-based operating system (including Linux, Solaris, and Apple OS X), or Microsoft Windows;
the program had to be oriented toward the creation and maintenance of text documents (i.e., an editor; an exception was made in one case for a spreadsheet because of its widespread [ab]use even though it was not designed for creating XML);
it had to pass an ‘entry test’ of basic XML functionality independent of any considerations of the interface itself (see Table 3.1.2);
Software was required to fulfill these conditions for the cases checked.
|Test||XML / SGML||Typographic||WP / DTP|
|Use an external template or stylesheet||✓||✓||✓|
|Recognize native file types or formats||✓||✓||✓|
|Parse for syntax violations||✓||✓|
|Store documents in an open file format||✓||✓|
The objective of the entry test was to exclude any software which did not show itself prima facie as being capable of handling of the file formats in question.
XML and SGML editors were expected and required to use the public text format. In the case of XML this is constrained by the XML Specification [Bray2000]: in the case of SGML we restricted the format to the Reference Concrete Syntax  for compatibility with XML. Several systems additionally use internal (sometimes undocumented) binary formats for speed but these were not examined.
Similarly, typesetting editors which work with structured text may use public or proprietary formats. Five programs selected use LATEX syntax or a close variant; the others use proprietary but documented formats common in the industry, or can export to such formats.
Word-processors and DTP systems traditionally use proprietary binary formats to protect their markets, and conversion to other formats is not always reliable. One system (Nota Bene) uses a proprietary but accessible plaintext format; another (OpenOffice) saves as XML natively, using Open Document files zipped with a stylesheet; a third (Microsoft Word) will shortly also save in XML by default, using its own (WordML) schema.
The programs selected were:
Emacs with psgml-mode (GNU)
epcEdit (EPC GbR)
Epic (ArborText), now Arbortext Editor
Office 11 (2003) [Word, Excel, InfoPath] (Microsoft)
WordPerfect 12 XML (Corel)
XML Spy and Authentic (Altova)
3B2 (Advent, now Arbortext)
WinEDT (Aleksander Simonics)
Scientific Word (Mackichan)
Textures (Blue Sky)
TEXnicCenter (Sven Wiegand)
Nota Bene (Nota Bene Associates)
Office 11 [Word: non-XML] (Microsoft)
WordPerfect non-XML (Corel)
In the case of programs which have a different primary function (eg XPress is a typesetter; Excel is a spreadsheet), only the built-in editor functions relating to structured document editing were exercised.
The functions exhibited by the program interfaces were categorized as shown below. In some cases this was only possible after careful disambiguation, as some manufacturers' use of terminology conflicted with the established usage. The initial categorization was obtained by inspection, and resulted in the division of functions into four classes:
actions which operate on the file as a unit, including opening, closing/saving, parsing and validating, printing, and managing ancillary files such as templates or stylesheets;
organizing and arranging document structure, including markup insertion, change, and removal; and other edit operations which operate on parts of the file identified by markup, such as context-sensitive cut-and-paste;
making things easier for the user, including editor stylesheeting, sizing, coloring, and the customization of dialogs, menus, toolbars, and context-sensitive searching;
metadocument functions, including entity management, character sets and encoding, the management of macros, plugins, and other utilities;
It is important to note that we excluded any conventional features operating on unmarked text only, or without respect to the markup. Many of these are common to all text systems everywhere, such as cut, copy, and paste, and they cannot be used to distinguish between structured-text systems unless they exhibit some sensitivity to markup. Similarly, some file-handling operations are also excluded (those which are not associated with markup activity, such as directory listing or display, recent file lists, etc).
Testing was conducted using functions found in the menus, keystrokes, or toolbars of the software (hidden features and those requiring specialist access were not included). For each program, the specified functionality was identified and exercised, where relevant using a simple test file designed to provide the conditions necessary for the test but without containing anything which might require special facilities (in fact an early draft of the introductory section of this paper).
In the case of XML systems the default format used for testing was the DocBook DTD; for LATEX systems it was the article document class; for others an empty document template was used. The functions tested were:
creating a new document from scratch according to a selected template;
opening an existing document;
closing a new document or modified old document;
saving a document;
printing a document to a printer;
‘printing’ a document to a file;
This is by definition the most extensive section. XML terminology has been used to describe the functions as it is the most widely understood.
For application to typesetting and word-processing systems, most of the functions have to be condensed to the level of the lowest common denominator, which is the simple distinction between character markup and paragraph markup, where ‘elements’ may be distinguished by name, but no hierarchy or content model exists (for example, Word's Named Styles). Style variants may be considered as broadly equivalent to attributes.
In the case of LATEX, where a clear hierarchy or content model exists or can be inferred, the equivalences in Table 3.2.2 were applied.
|environment||element with element content|
|control sequence (single argument)||element with PCDATA content|
|second or optional arguments||attributes|
|text-replacement macro||general entity reference|
|verbatim environment||CDATA marked section|
|special control sequence||processing instruction|
These are unquestionably simplistic and sometimes inaccurate in technical detail, but we are concerned at this stage with classifying the nature of the effect provided by a function (eg ‘create a list’, ‘start a new section’) rather than with the details of the operation of the interface (eg which menu or what type of widget is used to obtain the effect).
surround highlighted text with element markup
rename existing element
split existing element
combine element with following or preceding element of the same type
remove element markup, leave text
delete element and its content
check integrity of cross-references
insert entity reference
insert marked section
insert processing instruction
create new table
delete whole table
(12 functions) conventional table operations: insert, delete, merge, and split rows and columns, and edit table, row, column, and cell properties
edit in plain-text mode (allow trespass on the markup)
These make editing easier or more accurate. Additional ‘comfort factors’ such as edit colour preferences and tag font sizes (often subsumed under ‘preferences’) are excluded here as the number of individual settings possible is too large to make them all separate functions.
search within specified markup
spell-check by element
validate by element
create new stylesheet
merge with existing stylesheet
white-space handling functions (eg suppression or retention of significant and insignificant white-space, eg normalization and re-flow of the paragraph);
white-space display functions (eg show explicit spaces by displaying a symbol)
reveal/hide element markup
display document tree pane
display element selection pane
display attribute selection pane
switch to browser or print-formatted editing or preview
Some of these refer to specific markup features, but are included here because their effects are document-wide.
create new entity declaration
edit entity declaration
add/edit/delete notation declaration
assign and deassign external processor for notations
edit system metadata (not in markup)
establish or change character encoding
register or de-register plugin
add and delete table-editor equivalence entry (allows previously unrecognized structures to be edited as tabular data)
add/edit/delete script or macro
The presence or absence of each function was identified and recorded in a spreadsheet and the results are tabulated in in Sect2 4.3. Each function was exercised, using the minimally valid sample document referred to.
Given the very large number of data points generated by this procedure, it was decided to use exception reporting rather than conformance reporting. Thus where a specific feature is not commented on for a product, it may be assumed that the feature is present, and operated in the expected manner. The nature of this expectation is the subject of further study in this research.
In general, this survey showed a depressing lack of enthusiasm for editing software. The group is by its nature highly critical and well-informed, and in some cases evidently more expert in handling structured text than the manufacturers of the software: a common criticism was that the vendors appeared to be unaware of the requirements of a structured text editor.
67% of subjects tended to recommend editing software which the users were already familiar with, where possible, even if this conflicted with software which might be more appropriate for the tasks. The reasons given were that this approach minimised the need for [re-]training.
For their own use in working on XML projects, subjects preferred XMetaL (47%), Emacs (27%), and Epic (13%).
There was no significant mention of any editors having specific features regarded as specially useful above the conventional ones.
By contrast, there were many features specifically disliked: interface clutter, crashing or hanging (especially on large files), lack of support for catalogs and external entities, incomplete styling, instability, and poor typography. Script and macro support was particularly regarded as hard to use.
Requests from users included both WYSIWYG and plain-text editing:
‘‘Free’’ (OpenSource), ‘‘easy to use’’, and ‘‘simple’’ were the most frequently-requested
Images, validity, structure (tree-view), and equations also highly-placed
Long tail of other features
Requests when restricted to WYSIWYG enquiries only:
‘‘Ease of use’’ was most-requested
Equations, ‘‘free’’ (OpenSource), and images ranked high
Simplicity, structural control, and user-friendliness also important
‘‘Intuitive’’ is ambiguous (does the interface guess the user's requirements; or does the user guess what the interface means?)
The classification of functions and features was designed to identify which of them — if any — were sufficiently common across a range of interface implementations to be taken as forming the core functionality of structured-document editing. Remaining (ie non-core) functions could be further analyzed to see if they were specific to certain types of document, certain types of user, certain modes of editing (eg tables), or to the requirements of certain areas of use (specific industries or applications).
Key to Products
|print to file||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x|
|edit table props||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x|
|edit row props||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x|
|edit col props||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x|
|edit cell props||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x||x|
|search in markup||x||x||x||x||x||x||x||x|
|set character encoding||x||x||x||x||x||x||x||x||x||x|
All the XML editors examined possessed the same core editing features, with a small number of exceptions (for example Emacs/psgml has a ‘‘split element’’ command, but no ‘‘join element’’ command). To some extent the presence of these features is implicit in SGML and XML, if not exactly mandated: to be able to insert an element, you must have a control which allows you to do it.
The differences lie therefore in the placement and naming of the keys and menus in the interface. In the absence of further information, it must be assumed that the designers and marketers of the editors came to certain conclusions about what the user needed or wanted, and that their products reflect this perception.
All the editors had fairly comprehensive tables editing controls, either for the HTML table model or the CALS table model. The more advanced systems and those with a strong SGML document heritage (eg Epic, XMetaL) can do both, and more if programmed (eg the SASOUT table model). Emacs has a good plaintext table editor in the table.el module, which can produce LATEX or HTML table markup.
The widest variations were in the ‘‘ergonomic’’ and ‘‘editor management’’ features. While some of these are ‘‘comfort features’’ added to smooth the author's ride, some of them are critical to the operation of an editor for structured text (eg entity management), and their omission can only be seen as an admission by the manufacturer that their product is not suitable for authorial use.
No single editor examined can be said to be suitable for the non-expert in XML or LATEX. A significant understanding of markup theory, and of the specific markup for the user's application, would be needed before an author outside the XML/LATEX field could even begin using these programs. The extent of this training, and steps which might be taken to remedy the position, are the subject of further work. In the meantime, what might be termed the ‘‘semi-structured’’ interfaces of wordprocessors do almost as good a job for the author (although clearly not for the publisher), despite their obvious shortcomings.
Some facilities (including some of those mentioned by the experts surveyed earlier) are entirely missing in most editors unless programmed in with scripts of macros. Unlike wordprocessors and DTP systems, which generally work straight out of the box, XML editors usually require extensive customisation before they can be used for a specific application. This is slightly less true of LATEX, as the current default installations of popular distributions (eg TEX Live, MikTEX) include either a large selection of packages (plug-ins) or a transparent method for adding them from the network as and when needed. For XML editing, the deficiencies noted were:
a realistic working set of up-to-date DTDs and Schemas with stylesheets for popular applications;
real-time resolution of ID/IDREF crossreferencing, so that the process of creating a reference uses prompted interaction with the user to identify the target, assign an ID if none is given, and add the IDREF at the point of insertion, along with stylesheet-driven instantiation of the reference point as (eg) a number, letter, symbol, etc;
promotable and demotable block moves of elements in element content, so that a subsubsection moved to a section location becomes (at user option) a section in the process, and is not barred by beeping from the move on grounds of invalidity;
prompted visible cues for the compulsory elements (plural) of any newly-inserted structural element;
all menu items configurable so that those irrelevant for a given application can be tidied out of the way;
use of deductive logic for the control of keystroke-handling, especially for next-element insertion;
full control of the real-time declaration and use of external (file) entities;
Usable writing tools (spellcheckers, thesauruses, grammar-checkers) relevant to the user's field of work;
A better understanding is still required of what the users want, expect, and need from interfaces to structured documents. There has been a lot of work on interaction design at a lower level (eg operating systems interfaces), and this needs to be extended to the field of structured documents. While there can be little substitute at a professional level for careful training in the use of structured information, the increasing demand for systems which can be used by the untrained operator cannot commercially be ignored. An analysis of the expectations may reveal whether or not the use of structured text systems can be made easier or more effective without sacrificing accuracy and timeliness.
The research for this project is ongoing at the time of writing (early 2006). The workplan currently includes:
a second (user) survey (below);
the mapping of the results to the feature matrix;
the derivation of any changes to the prevailing interface paradigms;
the testing of possible prototype interface changes.
Among the candidates for evaluation are a number of methods of intuiting the user's requirements. Although some methods have been in use for many years, full use does not appear to have been made of user-driven changes to the visual interface in detecting activity which could support an interpretation in markup.
The target population for this survey contains the users who have some prior experience of work with structured documents; that is, they have used XML, LATEX, or another system of structured markup, with one or more of the selected products or suitable equivalent.
The objective is to gather their reactions to the software they used, why it was or was not suited to specific tasks, and what features or deficiencies they found. The structure and wording was informed by the work on the expert survey (Sect2 2.1).
To try and identify users' requests for editors and editing features, a technique was developed to retrieve the original posts which started any thread containing certain keywords, and then check that the original post also contained the same words. This rather roundabout technique was necessary because of the way in which the primary accessible archive for Usenet newsgroups (Google Groups) is accessed.
An initial search was carried out for the words ‘wysiwyg editor’, ‘structure editor’, ‘easy to use editor’, and many others.
This resulted in a large number of posts in common, in most cases over 50%.
To avoid the duplication of effort, the search was repeated for the key words separately, ‘editor’, ‘wysiwyg’, ‘structure’, etc.
For each post retrieved, the whole thread was accessed and the first post (the original post which started the thread) was isolated and the message text extracted.
These original messages were tested for the presence of at least one keyword, and the matches built into a mailbox file.
When all retrievals were finished, the duplicate posts were eliminated (using the Message-ID header value).
This resulted in 101 posts to comp.text.tex, 67 to comp.text.sgml, and 273 to comp.text.xml and XML-L together. These were plotted as a histogram by year of posting, as shown in Figure 1, and then read individually to identify the requested features as shown in Table 2.3.
The extraction and identification was performed by a simple shell script performing repeated calls to Google Groups using the wget utility. The resulting HTML page was regularised to XHTML by Tidy, and filtered by an XSLT script to identify the post matching the keyword (ignoring followups). This post was then retrieved separately by the same mechanism using another XSLT script, which resulted in the first ten posts to the thread. Finally, the top post was retrieved using the ‘Source’ switch, and a third XSLT script extracted the original message text.
I am indebted to Andy Arnt at Google for arranging permission to run scripted retrievals against their database.