The OLAC Metadata Set and Controlled Vocabularies
Steven Bird, Gary Simons

TL;DR
This paper introduces OLAC, a digital infrastructure utilizing a metadata set and controlled vocabularies to improve discovery and reuse of language resources amid growing linguistic data proliferation.
Contribution
It presents the OLAC metadata set and controlled vocabularies, enhancing consistency and searchability for language resources, based on the Open Archives Initiative.
Findings
Development of the OLAC metadata set
Implementation of controlled vocabularies for linguistic data
Progress and community input on metadata standards
Abstract
As language data and associated technologies proliferate and as the language resources community rapidly expands, it has become difficult to locate and reuse existing resources. Are there any lexical resources for such-and-such a language? What tool can work with transcripts in this particular format? What is a good format to use for linguistic data of this type? Questions like these dominate many mailing lists, since web search engines are an unreliable way to find language resources. This paper describes a new digital infrastructure for language resource discovery, based on the Open Archives Initiative, and called OLAC -- the Open Language Archives Community. The OLAC Metadata Set and the associated controlled vocabularies facilitate consistent description and focussed searching. We report progress on the metadata set and controlled vocabularies, describing current issues and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
