Developing Open Data Models for Linguistic Field Data

Baden Hughes

arXiv:cs/0305053·cs.DL·May 23, 2007

Developing Open Data Models for Linguistic Field Data

Baden Hughes

PDF

Open Access

TL;DR

This paper discusses developing open data models for digitizing and preserving endangered linguistic field data from the UQ Flint Archive, addressing challenges in data encoding, presentation, and long-term accessibility.

Contribution

It introduces a new open data model for linguistic field data, including tools and practices for digitization, annotation, and presentation of endangered language resources.

Findings

01

Successful digitization of endangered language data

02

Development of open standards for linguistic data encoding

03

Enhanced accessibility and preservation of linguistic field data

Abstract

The UQ Flint Archive houses the field notes and elicitation recordings made by Elwyn Flint in the 1950's and 1960's during extensive linguistic survey work across Queensland, Australia. The process of digitizing the contents of the UQ Flint Archive provides a number of interesting challenges in the context of EMELD. Firstly, all of the linguistic data is for languages which are either endangered or extinct, and as such forms a valuable ethnographic repository. Secondly, the physical format of the data is itself in danger of decline, and as such digitization is an important preservation task in the short to medium term. Thirdly, the adoption of open standards for the encoding and presentation of text and audio data for linguistic field data, whilst enabling preservation, represents a new field of research in itself where best practice has yet to be formalised. Fourthly, the provision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Diverse Musicological Studies · Language and cultural evolution