INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages
Hao Yu, Jesujoba O. Alabi, Andiswa Bukula, Jian Yun Zhuang, En-Shiun, Annie Lee, Tadesse Kebede Guge, Israel Abebe Azime, Happy Buzaaba, Blessing, Kudzaishe Sibanda, Godson K. Kalipe, Jonathan Mukiibi, Salomon Kabongo, Kabenamualu, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu

TL;DR
This paper introduces Injongo, a new multicultural dataset for 16 African languages, to evaluate intent detection and slot-filling in low-resource languages, highlighting challenges and opportunities for improving multilingual conversational AI.
Contribution
The paper presents Injongo, a novel open-source dataset for African languages, and benchmarks multilingual models, emphasizing the importance of culturally relevant data for better cross-lingual transfer.
Findings
LLMs struggle with slot-filling, with GPT-4o achieving only 26 F1-score.
Intent detection performs better, with an average accuracy of 70.6%.
Culturally relevant data improves cross-lingual transfer performance.
Abstract
Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
