DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in   Endangered Uralic Languages using ChatGPT

Mika H\"am\"al\"ainen

arXiv:2411.01531·cs.CL·November 5, 2024

DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT

Mika H\"am\"al\"ainen

PDF

Open Access

TL;DR

This paper demonstrates that augmenting ChatGPT prompts with dictionary translations significantly improves lemma disambiguation accuracy in endangered Uralic languages Erzya and Skolt Sami, despite ChatGPT's limited proficiency.

Contribution

It introduces a dictionary-augmented generation method to enhance disambiguation in low-resource languages using ChatGPT.

Findings

01

Achieved 50% accuracy for Skolt Sami

02

Achieved 41% accuracy for Erzya

03

Errors were similar to those made by untrained humans

Abstract

We showcase that ChatGPT can be used to disambiguate lemmas in two endangered languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our prompt by providing dictionary translations of the candidate lemmas to a majority language - Finnish in our case. This dictionary augmented generation approach results in 50\% accuracy for Skolt Sami and 41\% accuracy for Erzya. On a closer inspection, many of the error types were of the kind even an untrained human annotator would make.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification