Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

Jesus Alvarez C; Daua D. Karajeanes; Ashley Celeste Prado; John Ruttan; Ivory Yang; Sean O'Brien; Vasu Sharma; Kevin Zhu

arXiv:2505.18159·cs.CL·May 27, 2025

Advancing Uto-Aztecan Language Technologies: A Case Study on the Endangered Comanche Language

Jesus Alvarez C, Daua D. Karajeanes, Ashley Celeste Prado, John Ruttan, Ivory Yang, Sean O'Brien, Vasu Sharma, Kevin Zhu

PDF

1 Repo

TL;DR

This paper explores how minimal-cost, community-informed NLP techniques can support the preservation of the endangered Comanche language, demonstrating promising results with large language models in low-resource settings.

Contribution

It introduces the first computational study of Comanche, including a curated dataset, data generation pipeline, and evaluation of GPT models for language identification.

Findings

01

Few-shot prompting greatly improves LLM performance on Comanche

02

LLMs struggle with zero-shot language identification in low-resource settings

03

Targeted NLP approaches can aid endangered language preservation

Abstract

The digital exclusion of endangered languages remains a critical challenge in NLP, limiting both linguistic research and revitalization efforts. This study introduces the first computational investigation of Comanche, an Uto-Aztecan language on the verge of extinction, demonstrating how minimal-cost, community-informed NLP interventions can support language preservation. We present a manually curated dataset of 412 phrases, a synthetic data generation pipeline, and an empirical evaluation of GPT-4o and GPT-4o-mini for language identification. Our experiments reveal that while LLMs struggle with Comanche in zero-shot settings, few-shot prompting significantly improves performance, achieving near-perfect accuracy with just five examples. Our findings highlight the potential of targeted NLP methodologies in low-resource contexts and emphasize that visibility is the first step toward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

comanchegenerate/comanchesynthetic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.