Yor-Sarc: A gold-standard dataset for sarcasm detection in a low-resource African language
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov

TL;DR
This paper introduces Yor-Sarc, the first high-quality sarcasm detection dataset for Yorùbá, a low-resource African language, with detailed annotation protocols and high inter-annotator agreement to support NLP research.
Contribution
The creation of Yor-Sarc, a culturally informed, gold-standard sarcasm dataset for Yorùbá, including annotation guidelines and analysis of annotation reliability.
Findings
Achieved substantial to almost perfect inter-annotator agreement.
Demonstrated the dataset's potential to improve sarcasm detection in low-resource languages.
Provided a benchmark for future NLP research in Yorùbá and similar languages.
Abstract
Sarcasm detection poses a fundamental challenge in computational semantics, requiring models to resolve disparities between literal and intended meaning. The challenge is amplified in low-resource languages where annotated datasets are scarce or nonexistent. We present \textbf{Yor-Sarc}, the first gold-standard dataset for sarcasm detection in Yor\`{u}b\'{a}, a tonal Niger-Congo language spoken by over million people. The dataset comprises 436 instances annotated by three native speakers from diverse dialectal backgrounds using an annotation protocol specifically designed for Yor\`{u}b\'{a} sarcasm by taking culture into account. This protocol incorporates context-sensitive interpretation and community-informed guidelines and is accompanied by a comprehensive analysis of inter-annotator agreement to support replication in other African languages. Substantial to almost perfect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Natural Language Processing Techniques · Language, Metaphor, and Cognition
