Encoding models for scholarly literature
Martin Holmes (HCMC), Laurent Romary (INRIA Saclay - Ile de France,, IDSL)

TL;DR
This paper advocates for using XML, particularly a customized TEI schema, as the optimal digital encoding format for scholarly journal articles, emphasizing its advantages over binary formats like PDF and Word.
Contribution
It proposes developing a specialized TEI-based XML schema for encoding scholarly articles, highlighting its benefits over existing formats and schemas.
Findings
XML is superior to PDF and Word for journal article encoding.
A customized TEI schema can effectively encode scholarly articles.
TEI's detailed tagging supports precise scholarly document encoding.
Abstract
We examine the issue of digital formats for document encoding, archiving and publishing, through the specific example of "born-digital" scholarly journal articles. We will begin by looking at the traditional workflow of journal editing and publication, and how these practices have made the transition into the online domain. We will examine the range of different file formats in which electronic articles are currently stored and published. We will argue strongly that, despite the prevalence of binary and proprietary formats such as PDF and MS Word, XML is a far superior encoding choice for journal articles. Next, we look at the range of XML document structures (DTDs, Schemas) which are in common use for encoding journal articles, and consider some of their strengths and weaknesses. We will suggest that, despite the existence of specialized schemas intended specifically for journal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
