Databases for comparative syntactic research
Jessica K. Ivani, Balthasar Bickel

TL;DR
This paper surveys 21 linguistic databases for syntactic variation, categorizing them by data units and design principles, and discusses their features, advantages, limitations, and future needs.
Contribution
It provides a comprehensive classification and analysis of existing syntactic databases, highlighting design principles and guiding future database development.
Findings
Databases are categorized by data units and design principles.
Three primary design principles identified: monocategorization, multicategorization, structural decomposition.
All surveyed databases can be effectively classified along these dimensions.
Abstract
Recent years have witnessed a steep increase in linguistic databases capturing syntactic variation. We survey and describe 21 publicly available morpho-syntactic databases, focusing on such properties as data structure, user interface, documentation, formats, and overall user friendliness. We demonstrate that all the surveyed databases can be fruitfully categorized along two dimensions: units of description and the design principle. Units of description refer to the type of the data the database represents (languages, constructions, or expressions). The design principles capture the internal logic of the database. We identify three primary design principles, which vary in their descriptive power, granularity, and complexity: monocategorization, multicategorization, and structural decomposition. We describe how these design principles are implemented in concrete databases and discuss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Syntax, Semantics, Linguistic Variation · Language and cultural evolution
