Word Sense Disambiguation in Persian: Can AI Finally Get It Right?

Seyed Moein Ayyoubzadeh; Kourosh Shahnazari

arXiv:2406.00028·cs.CL·March 25, 2025

Word Sense Disambiguation in Persian: Can AI Finally Get It Right?

Seyed Moein Ayyoubzadeh, Kourosh Shahnazari

PDF

Open Access

TL;DR

This paper introduces a new Persian homograph disambiguation dataset, evaluates various embeddings and models, and provides insights to improve AI's ability to distinguish words with identical spellings but different meanings.

Contribution

It presents a novel Persian dataset, compares embedding effectiveness, and benchmarks multiple models for homograph disambiguation, advancing NLP in Persian language processing.

Findings

01

New Persian homograph dataset created

02

Embeddings vary in effectiveness across tasks

03

Model benchmarking offers guidance for future research

Abstract

Homograph disambiguation, the task of distinguishing words with identical spellings but different meanings, poses a substantial challenge in natural language processing. In this study, we introduce a novel dataset tailored for Persian homograph disambiguation. Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method and their efficacy in downstream tasks like classification. Our investigation entails training a diverse array of lightweight machine learning and deep learning models for phonograph disambiguation. We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score, thereby gaining insights into their respective strengths and limitations. The outcomes of our research underscore three key contributions. First, we present a newly curated Persian dataset, providing a solid foundation for future research in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling