Annotation graphs as a framework for multidimensional linguistic data analysis
Steven Bird, Mark Liberman

TL;DR
This paper introduces annotation graphs, a formal framework using labeled acyclic digraphs, to represent complex, multidimensional linguistic annotations across various formats and levels, enabling comparison and integration of diverse linguistic data.
Contribution
The paper presents a novel formal framework for linguistic annotation using annotation graphs, capable of representing complex, multi-level, and overlapping annotations from diverse formats.
Findings
Successfully modeled multi-level annotations from multiple schemes
Facilitated comparison of different annotation models
Enabled integration of diverse linguistic tools and corpora
Abstract
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing complex annotation structures incorporating hierarchy and overlap. Here, we motivate and illustrate our approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes. With the help of domain specialists, we have constructed a hybrid multi-level annotation for a fragment of the Boston University Radio Speech Corpus which includes the following levels: segment, word, breath, ToBI, Tilt, Treebank, coreference and named entity. We show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. We also show how the approach facilitates substantive comparison of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
