Homonym Sense Disambiguation in the Georgian Language

Davit Melikidze; Alexander Gamkrelidze

arXiv:2405.00710·cs.CL·May 3, 2024

Homonym Sense Disambiguation in the Georgian Language

Davit Melikidze, Alexander Gamkrelidze

PDF

Open Access

TL;DR

This paper introduces a supervised fine-tuning approach of a pre-trained Large Language Model for homonym sense disambiguation in Georgian, achieving 95% accuracy on a specialized dataset, and compares it with LSTM-based methods.

Contribution

It presents the first application of LLM fine-tuning for Georgian homonym disambiguation and provides experimental results demonstrating high accuracy.

Findings

01

Achieved 95% accuracy in disambiguating Georgian homonyms.

02

Demonstrated effectiveness of LLM fine-tuning over traditional methods.

03

Provided a new dataset of over 7500 sentences for Georgian WSD.

Abstract

This research proposes a novel approach to the Word Sense Disambiguation (WSD) task in the Georgian language, based on supervised fine-tuning of a pre-trained Large Language Model (LLM) on a dataset formed by filtering the Georgian Common Crawls corpus. The dataset is used to train a classifier for words with multiple senses. Additionally, we present experimental results of using LSTM for WSD. Accurately disambiguating homonyms is crucial in natural language processing. Georgian, an agglutinative language belonging to the Kartvelian language family, presents unique challenges in this context. The aim of this paper is to highlight the specific problems concerning homonym disambiguation in the Georgian language and to present our approach to solving them. The techniques discussed in the article achieve 95% accuracy for predicting lexical meanings of homonyms using a hand-classified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems