Developing Open Data Models for Linguistic Field Data
Baden Hughes

TL;DR
This paper discusses developing open data models for digitizing and preserving endangered linguistic field data from the UQ Flint Archive, addressing challenges in data encoding, presentation, and long-term accessibility.
Contribution
It introduces a new open data model for linguistic field data, including tools and practices for digitization, annotation, and presentation of endangered language resources.
Findings
Successful digitization of endangered language data
Development of open standards for linguistic data encoding
Enhanced accessibility and preservation of linguistic field data
Abstract
The UQ Flint Archive houses the field notes and elicitation recordings made by Elwyn Flint in the 1950's and 1960's during extensive linguistic survey work across Queensland, Australia. The process of digitizing the contents of the UQ Flint Archive provides a number of interesting challenges in the context of EMELD. Firstly, all of the linguistic data is for languages which are either endangered or extinct, and as such forms a valuable ethnographic repository. Secondly, the physical format of the data is itself in danger of decline, and as such digitization is an important preservation task in the short to medium term. Thirdly, the adoption of open standards for the encoding and presentation of text and audio data for linguistic field data, whilst enabling preservation, represents a new field of research in itself where best practice has yet to be formalised. Fourthly, the provision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Diverse Musicological Studies · Language and cultural evolution
