Word Sense Disambiguation in Persian: Can AI Finally Get It Right?
Seyed Moein Ayyoubzadeh, Kourosh Shahnazari

TL;DR
This paper introduces a new Persian homograph disambiguation dataset, evaluates various embeddings and models, and provides insights to improve AI's ability to distinguish words with identical spellings but different meanings.
Contribution
It presents a novel Persian dataset, compares embedding effectiveness, and benchmarks multiple models for homograph disambiguation, advancing NLP in Persian language processing.
Findings
New Persian homograph dataset created
Embeddings vary in effectiveness across tasks
Model benchmarking offers guidance for future research
Abstract
Homograph disambiguation, the task of distinguishing words with identical spellings but different meanings, poses a substantial challenge in natural language processing. In this study, we introduce a novel dataset tailored for Persian homograph disambiguation. Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method and their efficacy in downstream tasks like classification. Our investigation entails training a diverse array of lightweight machine learning and deep learning models for phonograph disambiguation. We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score, thereby gaining insights into their respective strengths and limitations. The outcomes of our research underscore three key contributions. First, we present a newly curated Persian dataset, providing a solid foundation for future research in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
