Paper paradigms on the InternetThe materiality of electronic textsPeterFlynnPeter Flynn is the UCC Webmaster and runs the
Computer
Centre's Electronic Publishing Unit, which provides advice and
facilities for staff and postgraduate publishing. He trained
at the London College of Printing and worked in business and
academia before joining UCC. Peter was on the development
committees for HTML and XML, and is the author of two books in
the field, and is technical consultant to several text-based
research projects. In his copious spare time he is doing a PhD
in Applied Psychology on the human-computer interfaces to
text.Materiality and TextUCC3 March 20061March 2006This presentation explains the development of traditional
publishing into electronic publishing, and how attempts to mimic
the paper paradigm are being superseded by single-source
methods, allowing the author to write once but have the work
made available in many different formats for different
purposes.Publication as an act of faithWhen the only medium of publication was paper, authors were
accustomed to settling down for a long wait between having an
article or a book accepted, and seeing it appear. The lengthy
lead times were a result of complex manual processes and slow
communications, both within the publishing and printing
businesses as well as within the authors' disciplines and
institutions. The physical materials—paper, typemetal,
ink—take a measurable time to manufacture and bring to the
point of use, and 500 very odd years of black artistry,
craftsmaster cabbala, and trade union mysticism contributed to
an industry more concerned with perpetuating its own mythology
than producing the goods. It required an act of faith, if not
supererogation, by the author to believe that the book or
article would actually appear in print.HistoryAll this was swept away by an ongoing technological wave
from the 1950s onwards. Prior to the second World War,
technological advance had been in steps: stereotyping (1790);
the metal press, lithography, and mechanically-produced paper
(all around 1800); the cylinder press (1815); steam power
(1830s); photoengraving and electric power (1890s); and
culminating in the invention of the Monotype and Linotype
machines just before the turn of the century, the pinnacle of
pre-computer electro-mechanical engineering. The remaining
major innovation, the web-fed press, did not arrive until the
1960s [dates from ].The substitution of photographic negatives for brass
matrices in 1950s typesetters brought the immediate
realisation that electrical control of the positioning and
exposure could be supplanted by digital control of the type
image. Control via computer came later: the end of the 1960s
and start of the 1970s saw the introduction of digital
composition on dedicated systems, and in 1978 came the first
desktop typesetting system that could run on a commodity
business computer .In 1982, this author demonstrated to a group of
UK print technologists and trade union experts. It caused
immediate outcry because it would place the creation of
typesetting in the hands of the [untrained] author or
customer, not the compositor. The business computer itself
had been invented in 1949 but
portable typesetting required a portable language
.During the 80s and 90s the surge turned into a tidal wave,
sweeping away London's Fleet Street and placing the facilities
for composition in the hands of anyone with a computer. A
reprographic shop on every Main Street provided the mass
reproduction capacity for almost anything except books and
journals, which needed global distribution channels still
controlled by the traditional publishers. The Internet has now
started to replace this as well, by providing for
self-publishing and community publishing, both with and
without peer review or editorial control.Notoriously in the field of Physics, authors demanded
and got—or rather, took—the ability to make preprints
available on the web, in defiance of the publishers'
unfounded misgivings about loss of journal sales. A more
conservative approach can be seen in the public access
policy of the US National Institutes of Health, which now
encourages authors to make their articles available on the
web immediately they have been published on paper
.Survival and persistenceRemarkably, paper publication has survived. Even metal
typesetting and hand-made paper have survived, albeit in a
specialist form. Publication on paper has also provided a form
of distributed repository, because it goes to an
institutionally diverse and geographically widespread
audience. It is relatively inaccessible—you have to travel to
see a document or wait for an inter-library loan—and the
locations are recorded only in the mailing lists of the
publisher and in library catalogues. But despite the apparent
fragility of paper, it has been remarkably resistant to
destruction and deterioration over the centuries.The low quality of much contemporary stock is not an
encouraging sign for the future.Paper-based publication processesThe dashed line represents the route taken by the
finished document when the author generates final-format
output (camera-ready copy), bypassing the conventional
composition process.Most of the traditional publishing processes shown in
are still in use. The increased use
of ICT has so far largely been restricted to an increase in
speed, not a change in the fundamental nature of the
publishing cycle itself. The canonical example of this is the
use of email, web, and FTP between publisher and author,
publisher and reviewers, and publisher and printer. An
exception is the use of electronic publishing for the author
to produce camera-ready copy, which can have its own
advantages and disadvantages.Although such technologies used to be cited as the
barriers defeating greater use of electronic publishing—see,
for example,
—human and organisational factors can
still emerge as the bottlenecks: arranging the peer review
process; synchronising submissions for multiple-author
publications; and handling negotiations with printers and
other subcontractors ().The increase in communications speed has also revealed a
number of other underlying factors whose delays were
previously masked by slow communication. Foremost amongst
these is the task of formatting or reformatting the book or
article, even—or perhaps especially—in the case already
mentioned, when the author has undertaken to provide
camera-ready copy. Another is the increased administrative and
manual workload in the publishers' production processes caused
by additional production tasks like the insertion of CDs or
DVDs in a book, and the re-checking for validity of URIs
quoted in the text.ChangesThe advance of information technology not only requires
new skills, but places an entirely new set of tools in the
hands of the author as well as the publisher. Both groups can
make use of web-based services (web sites, blogs, wikis, and
mediation systems) as well as web-based access to new as well
as traditional network services (email, newsgroups,
bulletin-boards, FTP servers, SMS, podcasts, and more). These
tools can be used for communication between the parties to a
publication, but they can also be used for the act of
publication itself.This is not lost on authors, who are demanding better
services from publishers before, during, and after
publication, and particularly in respect of their readership
as in the example of Physics already mentioned.Translating this to an institutional framework, it has
been the case for many years that works can be made available
within the institution as well as on the
Internet—before, during, after, and in some cases
instead of, formal publication by a recognised (paper)
publisher. In the case of self-publication, the workflow can
be reduced to that shown in . In some cases the institution
itself—possibly in the form of an established project or
service—has taken on the role of publisher.Web-based self-publication processThe psychology of human preference for paper
over screens
is outside the scope of this article, but there are several
technological threads which have contributed to its
persistence.Failure of e-booksE-book hardware is still cumbersome, expensive, and
unsatisfactory. E-book software was crippled at an early
stage by the failure to use XML (see )Control of rightsWhile there is no adequate substitute for current
copyright legislation, control over proprietary digital
formats has prevented, rather than expanded, the use of
electronic texts.Format disparityInstead of a single universal format there are at
least six mutually incompatible formats in current use,
none of which offers a usable solution.Cost modelTotal; cost of ownership is still too expensive. The
quality of most free resources is very low, and the
widespread failure to provide electronic versions to
purchasers of the paper versions means the paper one
takes precedence.Immateriality and the electronic textThe materiality of a physical text is self-evident, however
much dispute there may be about its nature. Curiously, in the
field of IT, people also speak of the physical
text in reference to electronic texts—in this
case meaning the master copy, the recognised source, or the
actual file on disk, as distinct from some representation of it
filtered through an interface. Those with sufficient experience
or technical knowledge can appreciate the quality of an
electronic text, and the skill (or lack of it) with which it has
been put together: a well-made electronic text is a pleasure to
work with because everything says what it means, and it requires
no undue additional work to perform whatever analytical or
transformational task is required.Given the caveats expressed in ,
it can be seen that the principal distinction between the
physical (paper) textOr, indeed, text on stone, metal, canvas, wood, clay, or
other substrate. and the electronic text is still that the primary
use of the physical is reading and of the electronic is
transformation or analysis. In many cases, electronic texts of
any significant length still end up being printed out one way or
another, simply because the act of extended reading is best
performed with paper.The electronic text nevertheless has a number of attributes
significantly different from the paper text:on a network it can be obtained almost
instantaneously;If we leave aside the small minority of CD/DVD-based
texts which are only available through postal purchase,
or under institutional (library) contract.the use of hypertext techniques means that
cross-references and other epexegetical addenda can be seen
equally swiftly;personal and institutional publication allows text to
reach a large audience more easily;the text can be manipulated in many ways for analysis,
extraction, or even republication;Given the relevant permissions and a suitable
format: there are still many electronic texts available
which prohibit these uses, either explicitly or through
ignorance.assuming it is on the network, it can be accessed from
anywhere at any time.An electronic text is in fact just a collection of
electromagnetic impulses, so in one sense it has a very physical
existence at the level of particle physics and electronic
engineering. But its appearance on your screen is impermanent:
if you turn your computer off, or close your browser, the text
you were looking at vanishes; but the original source remains
and can be accessed again on another occasion. This
immateriality is a key to the use of electronic texts provided
that the master source remains inviolate and accessible. It is a
cardinal error to assume that an electronic text possesses all
the attributes of a paper publication: they are two different
concepts, although they share a number of common
features.Hypertext originated with the theories of and relates to the
instatiation of cross-references (hyperlinks) between and within
text documents. Its currency on the Internet is due to the
invention of the World Wide Web .
While there is some disputed evidence that—at a technical
level—hypertext is now being subsumed into the more
general concept of
cybertext , it remains one of the principal reasons
for the growth in the availability of transcriptions of extant
paper texts, as well as the growth in texts which are only
available in electronic form.Effect on publishingAs explained in , authors are now
expecting a more flexible service from publishers, with their
works being accessible via the network in one way or another.
Those who read or analyse texts are also coming to expect that
primary sources would be available online in secondary
(transcribed) forms, if not as whole texts then at least in
searchable form for reference and citation.In the case of authors, publishers do now tend to make
journals available online, although frequently in a restricted
form. There is evidence from the experiences of the Physics
community that online publication in fact increases journal
sales, rather than decreases them—perhaps in itself
evidence of a preference for the print copy. While books
remain a source of revenue under existing copyright
legislation, it is unlikely that they can be published online
in a similar way, but there is a strong case that out-of-print
books which are in the publishers' archives in electronic form
should simply be made available at nominal cost online,
perhaps even generating a small revenue instead of
none.Electronic editions of primary sources have slowly become
available in some fields over the last two to three decades.
Where a document is out of copyright, there is clearly no
impediment to its being transcribed and made available,
especially if it is marked up in a suitable format such as TEI
. These techniques have been steadily
improved in the decade and a half since they became available,
and there are numerous repositories of texts in a variety of
disciplines. In some institutions, scholars creating
transcriptions for their own researches are now being required
to make the transcription available for the use of others,
which in the long term will benefit both the discipline and
the institution as well as the individual: a refusal is now
widely regarded as dog-in-the-manger behaviour. A
properly-constructed electronic version can be used for
reference, and for virtually any kind of textual or linguistic
analysis—however, there are many kinds of scholarship
for which an electronic version is irrelevant, and its
availability cannot be seen as a panacæa.Self-publication and institutional publicationThe final aspect of the electronic
text is publication in
the absence of a traditional publishing house. In its simplest
form, as shown in , the author puts a
file on a web server. While this is easy to do, and requires
minimal effort, it simply makes the file available, without
any information about format or origin. Many institutional
repositories make a similar error, assuming that a document's
presence on the web is sufficient.For a text to be usable, however, some additional factors
have to be taken into account:File formatProprietary binary formats such as
Word's
.doc are too dependent on specific
software and versions for them to be in any way durable.
PDF is acceptable for short-term use, but for long-term
accessibility only a robust format such as XML can be
considered. HTML is a poor substitute, but better than
nothing.MetadataThe file must carry within itself sufficient
information for its owner, provenance, date, and
authenticity to be detected. Most formats provide at
least for the level of Dublin Core metadata.StructureUnless the file is a stretch of unformatted,
unbroken prose narrative, there must be sufficient
markup for the intended structure to be discerned by
analysis and formatting engines. Markup must be
explicit: it is inadequate to rely on visual formatting
alone.ContentIn the case of transcriptions, explicit markup can
identify important aspects of the text; both physical,
such as names, dates, and significant words or phrases;
and metalogical, such as arguments of sociological
transactions.Texts made available in this way will go as far as is
reasonably possible to preserve those aspects of the paper
publication that can be preserved. Those that
cannot—look, smell, feel, a sense of history and
ownership, and the act of sole possession—are not
relevant in an electronic medium. By the same token, the
electronic text is endowed with features impossible to obtain
on paper without a large investment of time: the ability to be
searched, sorted, summarised, subsetted, analysed, and
reformatted.Given time and the relevant resources, a paper publication
can certainly be used to provide all the above, but these
resources are rarely at the command of the researcher, and it
is probably an abuse of their time—and that of their
institution—to undertake these tasks manually. The 17th
to 19th century literary gentleman scholar of independent
means could perhaps afford this luxury: the 21st century
researcher working to a tight funding schedule and
externally-imposed timescales almost certainly cannot.ConclusionThe paper publication and the electronic copy (or the
electronic original) are possessed of several common and
several independent attributes which contribute to their
materiality.In common they are composed both of physical attributes
(letters, words, sentences, sections, chapters) and logical
attributes (arguments, references, explanations, examples),
which can be identified explicitly (by markup) or implicitly
(by reading and understanding). Both can point to other texts,
and both can have other texts point to them.The physical text can easily be carried about with no
additional apparatus, and can be discerned with all the
senses. It can have a degree of ownership and history, social
importance and political weight, even to the point of a sense
of awe.An electronic text can almost instantaneously be moved,
copied, broken into its component parts and analysed or
reassembled—in or out of order—it can be used in
part or whole as a component of another such text, and can do
the same with other texts.While an electronic text copy certainly can be formatted
to resemble its paper original, and even printed to resemble
it, this is only useful by way of example or to avoid the
contamination of a rare original. Given our current level of
technology, it cannot rationally substitute for the paper
text, any more than the paper text can substitute for
it.TomWilsonElectronic publishing and the future of the
bookInformation Research32September 19971368-1613WilliamBostockThe Function Of The Electronic Journal (EJ) In The
Academic ProcessAn AppraisalThe Craft3220011029-6980National Institutes of
HealthPolicy on Enhancing Public Access to Archived
Publications Resulting from NIH-Funded ResearchNIHBethesda, MDMay 2005http://publicaccess.nih.gov/Cybertext Killed the Hypertext
StarA review of Cybertext: Perspectives on
Ergodic Literature (Espen Aarseth: Johns
Hopkins University Press, August 1997,
0801855799)NickMontfortElectronic Book ReviewJoeTabbiMark AmerikaChicago, ILDecember 2000http://www.electronicbookreview.com/thread/electropoetics/cyberdebatesNo
More Teachers' Dirty LooksTheodorNelsonComputer DecisionsSeptember 1970WorldWideWebProposal for a HyperText ProjectTimBerners-LeeRobertCailliauCentre
Européenne pour la Récherche Nucléaire (CERN:
European Centre for Particle Physics Research)November 1990http://www.w3.org/ProposalA Computer called LEOLyons Teashops
and the world's first office
computerGeorginaFerryFourth EstateLondon20031841151858Five Hundred Years of PrintingSHSteinbergPelican (Penguin)London197430140203435KnuthDEThe
bookAddison-WesleyReadingMA1984The Programming Language PascalNiklausWirthActa Informatica1June 197135–63BurnardLouisSperberg-McQueenCMGuidelines
for the Encoding and Interchange of
Machine-Readable TextsText Encoding InitiativeOxford and Chicago1994http://www.tei-c.edu