ATLAS: A flexible and extensible architecture for linguistic annotation
Steven Bird, David Day, John Garofolo, John Henderson, Christophe, Laprun, Mark Liberman

TL;DR
This paper introduces ATLAS, a flexible architecture for linguistic annotation that unifies various annotation formats through a formal model and API, supporting diverse signals and promoting tool reuse.
Contribution
It presents a formal annotation model and API that generalizes existing approaches, enabling flexible, extensible linguistic annotations across multiple signal types.
Findings
Annotation Graphs efficiently model linear signal annotations.
The model can map many existing corpora formats.
Implementation efforts are underway for key architecture components.
Abstract
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
