Paper paradigms on the InternetThe materiality of electronic textsPeterFlynnPeter Flynn is the UCC Webmaster and runs the Computer Centre's Electronic Publishing Unit, which provides advice and facilities for staff and postgraduate publishing. He trained at the London College of Printing and worked in business and academia before joining UCC. Peter was on the development committees for HTML and XML, and is the author of two books in the field, and is technical consultant to several text-based research projects. In his copious spare time he is doing a PhD in Applied Psychology on the human-computer interfaces to text.Materiality and TextUCC3 March 20061March 2006This presentation explains the development of traditional publishing into electronic publishing, and how attempts to mimic the paper paradigm are being superseded by single-source methods, allowing the author to write once but have the work made available in many different formats for different purposes.Publication as an act of faithWhen the only medium of publication was paper, authors were accustomed to settling down for a long wait between having an article or a book accepted, and seeing it appear. The lengthy lead times were a result of complex manual processes and slow communications, both within the publishing and printing businesses as well as within the authors' disciplines and institutions. The physical materials—paper, typemetal, ink—take a measurable time to manufacture and bring to the point of use, and 500 very odd years of black artistry, craftsmaster cabbala, and trade union mysticism contributed to an industry more concerned with perpetuating its own mythology than producing the goods. It required an act of faith, if not supererogation, by the author to believe that the book or article would actually appear in print.HistoryAll this was swept away by an ongoing technological wave from the 1950s onwards. Prior to the second World War, technological advance had been in steps: stereotyping (1790); the metal press, lithography, and mechanically-produced paper (all around 1800); the cylinder press (1815); steam power (1830s); photoengraving and electric power (1890s); and culminating in the invention of the Monotype and Linotype machines just before the turn of the century, the pinnacle of pre-computer electro-mechanical engineering. The remaining major innovation, the web-fed press, did not arrive until the 1960s [dates from ].The substitution of photographic negatives for brass matrices in 1950s typesetters brought the immediate realisation that electrical control of the positioning and exposure could be supplanted by digital control of the type image. Control via computer came later: the end of the 1960s and start of the 1970s saw the introduction of digital composition on dedicated systems, and in 1978 came the first desktop typesetting system that could run on a commodity business computer .In 1982, this author demonstrated to a group of UK print technologists and trade union experts. It caused immediate outcry because it would place the creation of typesetting in the hands of the [untrained] author or customer, not the compositor. The business computer itself had been invented in 1949 but portable typesetting required a portable language .During the 80s and 90s the surge turned into a tidal wave, sweeping away London's Fleet Street and placing the facilities for composition in the hands of anyone with a computer. A reprographic shop on every Main Street provided the mass reproduction capacity for almost anything except books and journals, which needed global distribution channels still controlled by the traditional publishers. The Internet has now started to replace this as well, by providing for self-publishing and community publishing, both with and without peer review or editorial control.Notoriously in the field of Physics, authors demanded and got—or rather, took—the ability to make preprints available on the web, in defiance of the publishers' unfounded misgivings about loss of journal sales. A more conservative approach can be seen in the public access policy of the US National Institutes of Health, which now encourages authors to make their articles available on the web immediately they have been published on paper .Survival and persistenceRemarkably, paper publication has survived. Even metal typesetting and hand-made paper have survived, albeit in a specialist form. Publication on paper has also provided a form of distributed repository, because it goes to an institutionally diverse and geographically widespread audience. It is relatively inaccessible—you have to travel to see a document or wait for an inter-library loan—and the locations are recorded only in the mailing lists of the publisher and in library catalogues. But despite the apparent fragility of paper, it has been remarkably resistant to destruction and deterioration over the centuries.The low quality of much contemporary stock is not an encouraging sign for the future.Paper-based publication processesThe dashed line represents the route taken by the finished document when the author generates final-format output (camera-ready copy), bypassing the conventional composition process.Most of the traditional publishing processes shown in are still in use. The increased use of ICT has so far largely been restricted to an increase in speed, not a change in the fundamental nature of the publishing cycle itself. The canonical example of this is the use of email, web, and FTP between publisher and author, publisher and reviewers, and publisher and printer. An exception is the use of electronic publishing for the author to produce camera-ready copy, which can have its own advantages and disadvantages.Although such technologies used to be cited as the barriers defeating greater use of electronic publishing—see, for example, —human and organisational factors can still emerge as the bottlenecks: arranging the peer review process; synchronising submissions for multiple-author publications; and handling negotiations with printers and other subcontractors ().The increase in communications speed has also revealed a number of other underlying factors whose delays were previously masked by slow communication. Foremost amongst these is the task of formatting or reformatting the book or article, even—or perhaps especially—in the case already mentioned, when the author has undertaken to provide camera-ready copy. Another is the increased administrative and manual workload in the publishers' production processes caused by additional production tasks like the insertion of CDs or DVDs in a book, and the re-checking for validity of URIs quoted in the text.ChangesThe advance of information technology not only requires new skills, but places an entirely new set of tools in the hands of the author as well as the publisher. Both groups can make use of web-based services (web sites, blogs, wikis, and mediation systems) as well as web-based access to new as well as traditional network services (email, newsgroups, bulletin-boards, FTP servers, SMS, podcasts, and more). These tools can be used for communication between the parties to a publication, but they can also be used for the act of publication itself.This is not lost on authors, who are demanding better services from publishers before, during, and after publication, and particularly in respect of their readership as in the example of Physics already mentioned.Translating this to an institutional framework, it has been the case for many years that works can be made available within the institution as well as on the Internet—before, during, after, and in some cases instead of, formal publication by a recognised (paper) publisher. In the case of self-publication, the workflow can be reduced to that shown in . In some cases the institution itself—possibly in the form of an established project or service—has taken on the role of publisher.Web-based self-publication processThe psychology of human preference for paper over screens is outside the scope of this article, but there are several technological threads which have contributed to its persistence.Failure of e-booksE-book hardware is still cumbersome, expensive, and unsatisfactory. E-book software was crippled at an early stage by the failure to use XML (see )Control of rightsWhile there is no adequate substitute for current copyright legislation, control over proprietary digital formats has prevented, rather than expanded, the use of electronic texts.Format disparityInstead of a single universal format there are at least six mutually incompatible formats in current use, none of which offers a usable solution.Cost modelTotal; cost of ownership is still too expensive. The quality of most free resources is very low, and the widespread failure to provide electronic versions to purchasers of the paper versions means the paper one takes precedence.Immateriality and the electronic textThe materiality of a physical text is self-evident, however much dispute there may be about its nature. Curiously, in the field of IT, people also speak of the physical text in reference to electronic texts—in this case meaning the master copy, the recognised source, or the actual file on disk, as distinct from some representation of it filtered through an interface. Those with sufficient experience or technical knowledge can appreciate the quality of an electronic text, and the skill (or lack of it) with which it has been put together: a well-made electronic text is a pleasure to work with because everything says what it means, and it requires no undue additional work to perform whatever analytical or transformational task is required.Given the caveats expressed in , it can be seen that the principal distinction between the physical (paper) textOr, indeed, text on stone, metal, canvas, wood, clay, or other substrate. and the electronic text is still that the primary use of the physical is reading and of the electronic is transformation or analysis. In many cases, electronic texts of any significant length still end up being printed out one way or another, simply because the act of extended reading is best performed with paper.The electronic text nevertheless has a number of attributes significantly different from the paper text:on a network it can be obtained almost instantaneously;If we leave aside the small minority of CD/DVD-based texts which are only available through postal purchase, or under institutional (library) contract.the use of hypertext techniques means that cross-references and other epexegetical addenda can be seen equally swiftly;personal and institutional publication allows text to reach a large audience more easily;the text can be manipulated in many ways for analysis, extraction, or even republication;Given the relevant permissions and a suitable format: there are still many electronic texts available which prohibit these uses, either explicitly or through ignorance.assuming it is on the network, it can be accessed from anywhere at any time.An electronic text is in fact just a collection of electromagnetic impulses, so in one sense it has a very physical existence at the level of particle physics and electronic engineering. But its appearance on your screen is impermanent: if you turn your computer off, or close your browser, the text you were looking at vanishes; but the original source remains and can be accessed again on another occasion. This immateriality is a key to the use of electronic texts provided that the master source remains inviolate and accessible. It is a cardinal error to assume that an electronic text possesses all the attributes of a paper publication: they are two different concepts, although they share a number of common features.Hypertext originated with the theories of and relates to the instatiation of cross-references (hyperlinks) between and within text documents. Its currency on the Internet is due to the invention of the World Wide Web . While there is some disputed evidence that—at a technical level—hypertext is now being subsumed into the more general concept of cybertext , it remains one of the principal reasons for the growth in the availability of transcriptions of extant paper texts, as well as the growth in texts which are only available in electronic form.Effect on publishingAs explained in , authors are now expecting a more flexible service from publishers, with their works being accessible via the network in one way or another. Those who read or analyse texts are also coming to expect that primary sources would be available online in secondary (transcribed) forms, if not as whole texts then at least in searchable form for reference and citation.In the case of authors, publishers do now tend to make journals available online, although frequently in a restricted form. There is evidence from the experiences of the Physics community that online publication in fact increases journal sales, rather than decreases them—perhaps in itself evidence of a preference for the print copy. While books remain a source of revenue under existing copyright legislation, it is unlikely that they can be published online in a similar way, but there is a strong case that out-of-print books which are in the publishers' archives in electronic form should simply be made available at nominal cost online, perhaps even generating a small revenue instead of none.Electronic editions of primary sources have slowly become available in some fields over the last two to three decades. Where a document is out of copyright, there is clearly no impediment to its being transcribed and made available, especially if it is marked up in a suitable format such as TEI . These techniques have been steadily improved in the decade and a half since they became available, and there are numerous repositories of texts in a variety of disciplines. In some institutions, scholars creating transcriptions for their own researches are now being required to make the transcription available for the use of others, which in the long term will benefit both the discipline and the institution as well as the individual: a refusal is now widely regarded as dog-in-the-manger behaviour. A properly-constructed electronic version can be used for reference, and for virtually any kind of textual or linguistic analysis—however, there are many kinds of scholarship for which an electronic version is irrelevant, and its availability cannot be seen as a panacæa.Self-publication and institutional publicationThe final aspect of the electronic text is publication in the absence of a traditional publishing house. In its simplest form, as shown in , the author puts a file on a web server. While this is easy to do, and requires minimal effort, it simply makes the file available, without any information about format or origin. Many institutional repositories make a similar error, assuming that a document's presence on the web is sufficient.For a text to be usable, however, some additional factors have to be taken into account:File formatProprietary binary formats such as Word's .doc are too dependent on specific software and versions for them to be in any way durable. PDF is acceptable for short-term use, but for long-term accessibility only a robust format such as XML can be considered. HTML is a poor substitute, but better than nothing.MetadataThe file must carry within itself sufficient information for its owner, provenance, date, and authenticity to be detected. Most formats provide at least for the level of Dublin Core metadata.StructureUnless the file is a stretch of unformatted, unbroken prose narrative, there must be sufficient markup for the intended structure to be discerned by analysis and formatting engines. Markup must be explicit: it is inadequate to rely on visual formatting alone.ContentIn the case of transcriptions, explicit markup can identify important aspects of the text; both physical, such as names, dates, and significant words or phrases; and metalogical, such as arguments of sociological transactions.Texts made available in this way will go as far as is reasonably possible to preserve those aspects of the paper publication that can be preserved. Those that cannot—look, smell, feel, a sense of history and ownership, and the act of sole possession—are not relevant in an electronic medium. By the same token, the electronic text is endowed with features impossible to obtain on paper without a large investment of time: the ability to be searched, sorted, summarised, subsetted, analysed, and reformatted.Given time and the relevant resources, a paper publication can certainly be used to provide all the above, but these resources are rarely at the command of the researcher, and it is probably an abuse of their time—and that of their institution—to undertake these tasks manually. The 17th to 19th century literary gentleman scholar of independent means could perhaps afford this luxury: the 21st century researcher working to a tight funding schedule and externally-imposed timescales almost certainly cannot.ConclusionThe paper publication and the electronic copy (or the electronic original) are possessed of several common and several independent attributes which contribute to their materiality.In common they are composed both of physical attributes (letters, words, sentences, sections, chapters) and logical attributes (arguments, references, explanations, examples), which can be identified explicitly (by markup) or implicitly (by reading and understanding). Both can point to other texts, and both can have other texts point to them.The physical text can easily be carried about with no additional apparatus, and can be discerned with all the senses. It can have a degree of ownership and history, social importance and political weight, even to the point of a sense of awe.An electronic text can almost instantaneously be moved, copied, broken into its component parts and analysed or reassembled—in or out of order—it can be used in part or whole as a component of another such text, and can do the same with other texts.While an electronic text copy certainly can be formatted to resemble its paper original, and even printed to resemble it, this is only useful by way of example or to avoid the contamination of a rare original. Given our current level of technology, it cannot rationally substitute for the paper text, any more than the paper text can substitute for it.TomWilsonElectronic publishing and the future of the bookInformation Research32September 19971368-1613WilliamBostockThe Function Of The Electronic Journal (EJ) In The Academic ProcessAn AppraisalThe Craft3220011029-6980National Institutes of HealthPolicy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded ResearchNIHBethesda, MDMay 2005 Killed the Hypertext StarA review of Cybertext: Perspectives on Ergodic Literature (Espen Aarseth: Johns Hopkins University Press, August 1997, 0801855799)NickMontfortElectronic Book ReviewJoeTabbiMark AmerikaChicago, ILDecember 2000 More Teachers' Dirty LooksTheodorNelsonComputer DecisionsSeptember 1970WorldWideWebProposal for a HyperText ProjectTimBerners-LeeRobertCailliauCentre Européenne pour la Récherche Nucléaire (CERN: European Centre for Particle Physics Research)November 1990 Computer called LEOLyons Teashops and the world's first office computerGeorginaFerryFourth EstateLondon20031841151858Five Hundred Years of PrintingSHSteinbergPelican (Penguin)London197430140203435KnuthDEThe bookAddison-WesleyReadingMA1984The Programming Language PascalNiklausWirthActa Informatica1June 197135–63BurnardLouisSperberg-McQueenCMGuidelines for the Encoding and Interchange of Machine-Readable TextsText Encoding InitiativeOxford and Chicago1994