Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages
Micha Elsner, David Liu

TL;DR
This paper explores a retrieval-based prompting method for large language models to automate interlinear glossing in low-resource languages, outperforming existing baselines and aiding linguistic documentation.
Contribution
It introduces a novel retrieval-based prompting approach for LLMs that improves glossing accuracy across multiple languages, demonstrating potential for interactive linguistic annotation.
Findings
Outperforms BERT-based baseline in morpheme-level scores for all tested languages.
A simple 3-best oracle surpasses the challenge winner in five languages.
LLMs can follow linguistic instructions to reduce errors in complex grammatical features.
Abstract
Partly automated creation of interlinear glossed text (IGT) has the potential to assist in linguistic documentation. We argue that LLMs can make this process more accessible to linguists because of their capacity to follow natural-language instructions. We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task. Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category, and we show that a simple 3-best oracle has higher word-level scores than the challenge winner (a tuned sequence model) in five languages. In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature. Our results thus demonstrate the potential contributions which LLMs can make in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
