The OLAC Metadata Set and Controlled Vocabularies

Steven Bird; Gary Simons

arXiv:cs/0105030·cs.CL·May 23, 2007

The OLAC Metadata Set and Controlled Vocabularies

Steven Bird, Gary Simons

PDF

Open Access

TL;DR

This paper introduces OLAC, a digital infrastructure utilizing a metadata set and controlled vocabularies to improve discovery and reuse of language resources amid growing linguistic data proliferation.

Contribution

It presents the OLAC metadata set and controlled vocabularies, enhancing consistency and searchability for language resources, based on the Open Archives Initiative.

Findings

01

Development of the OLAC metadata set

02

Implementation of controlled vocabularies for linguistic data

03

Progress and community input on metadata standards

Abstract

As language data and associated technologies proliferate and as the language resources community rapidly expands, it has become difficult to locate and reuse existing resources. Are there any lexical resources for such-and-such a language? What tool can work with transcripts in this particular format? What is a good format to use for linguistic data of this type? Questions like these dominate many mailing lists, since web search engines are an unreliable way to find language resources. This paper describes a new digital infrastructure for language resource discovery, based on the Open Archives Initiative, and called OLAC -- the Open Language Archives Community. The OLAC Metadata Set and the associated controlled vocabularies facilitate consistent description and focussed searching. We report progress on the metadata set and controlled vocabularies, describing current issues and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification