DAG: Dictionary-Augmented Generation for Disambiguation of Sentences in Endangered Uralic Languages using ChatGPT
Mika H\"am\"al\"ainen

TL;DR
This paper demonstrates that augmenting ChatGPT prompts with dictionary translations significantly improves lemma disambiguation accuracy in endangered Uralic languages Erzya and Skolt Sami, despite ChatGPT's limited proficiency.
Contribution
It introduces a dictionary-augmented generation method to enhance disambiguation in low-resource languages using ChatGPT.
Findings
Achieved 50% accuracy for Skolt Sami
Achieved 41% accuracy for Erzya
Errors were similar to those made by untrained humans
Abstract
We showcase that ChatGPT can be used to disambiguate lemmas in two endangered languages ChatGPT is not proficient in, namely Erzya and Skolt Sami. We augment our prompt by providing dictionary translations of the candidate lemmas to a majority language - Finnish in our case. This dictionary augmented generation approach results in 50\% accuracy for Skolt Sami and 41\% accuracy for Erzya. On a closer inspection, many of the error types were of the kind even an untrained human annotator would make.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
