Prompt and circumstance: A word-by-word LLM prompting approach to   interlinear glossing for low-resource languages

Micha Elsner; David Liu

arXiv:2502.09778·cs.CL·March 25, 2025

Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages

Micha Elsner, David Liu

PDF

Open Access 1 Video

TL;DR

This paper explores a retrieval-based prompting method for large language models to automate interlinear glossing in low-resource languages, outperforming existing baselines and aiding linguistic documentation.

Contribution

It introduces a novel retrieval-based prompting approach for LLMs that improves glossing accuracy across multiple languages, demonstrating potential for interactive linguistic annotation.

Findings

01

Outperforms BERT-based baseline in morpheme-level scores for all tested languages.

02

A simple 3-best oracle surpasses the challenge winner in five languages.

03

LLMs can follow linguistic instructions to reduce errors in complex grammatical features.

Abstract

Partly automated creation of interlinear glossed text (IGT) has the potential to assist in linguistic documentation. We argue that LLMs can make this process more accessible to linguists because of their capacity to follow natural-language instructions. We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task. Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category, and we show that a simple 3-best oracle has higher word-level scores than the challenge winner (a tuned sequence model) in five languages. In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature. Our results thus demonstrate the potential contributions which LLMs can make in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling