The Open Language Archives Community: An infrastructure for distributed archiving of language resources
Gary Simons, Steven Bird

TL;DR
The paper presents the OLAC infrastructure, which facilitates distributed archiving and discovery of language resources on the web, addressing challenges of resource discovery and creation through technical, usage, and governance solutions.
Contribution
It introduces a comprehensive infrastructure that enables distributed archiving, resource discovery, and community consensus on best practices for language resources.
Findings
Built a virtual library of distributed language resources
Established mechanisms for community consensus on best practices
Enhanced resource discovery across multiple repositories
Abstract
New ways of documenting and describing language via electronic media coupled with new ways of distributing the results via the World-Wide Web offer a degree of access to language resources that is unparalleled in history. At the same time, the proliferation of approaches to using these new technologies is causing serious problems relating to resource discovery and resource creation. This article describes the infrastructure that the Open Language Archives Community (OLAC) has built in order to address these problems. Its technical and usage infrastructures address problems of resource discovery by constructing a single virtual library of distributed resources. Its governance infrastructure addresses problems of resource creation by providing a mechanism through which the language-resource community can express its consensus on recommended best practices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
