Deep encoding of etymological information in TEI
Jack Bowers (OEAW), Laurent Romary (CMB, ALPAGE)

TL;DR
This paper develops a systematic approach to modeling and representing etymological data in digital dictionaries using TEI guidelines, aiming to unify legacy and digital lexical resources for easier historical word analysis.
Contribution
It introduces a coherent set of modeling principles for etymological data within TEI, enabling seamless integration of various lexical resources.
Findings
Proposes a comprehensive TEI-based framework for etymology
Facilitates integration of legacy and digital lexical data
Supports seamless querying of historical word information
Abstract
This paper aims to provide a comprehensive modeling and representation of etymological data in digital dictionaries. The purpose is to integrate in one coherent framework both digital representations of legacy dictionaries, and also born-digital lexical databases that are constructed manually or semi-automatically. We want to propose a systematic and coherent set of modeling principles for a variety of etymological phenomena that may contribute to the creation of a continuum between existing and future lexical constructs, where anyone interested in tracing the history of words and their meanings will be able to seamlessly query lexical resources.Instead of designing an ad hoc model and representation language for digital etymological data, we will focus on identifying all the possibilities offered by the TEI guidelines for the representation of lexical information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLexicography and Language Studies · Natural Language Processing Techniques · Linguistics and language evolution
