SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding
Haroun Elleuch, Salima Mdhaffar, Yannick Est\`eve, Fethi Bougares

TL;DR
This paper introduces SLURP-TN, a new Tunisian dialect spoken language understanding dataset with 4165 sentences, enabling development of SLU and ASR models for low-resource dialects, and provides baseline models for future research.
Contribution
The paper presents SLURP-TN, a novel Tunisian dialect SLU dataset, and baseline models, addressing resource scarcity in low-resource language dialects.
Findings
Created a 4165-sentence Tunisian dialect SLU dataset
Developed baseline SLU and ASR models using the dataset
Made dataset and models publicly available
Abstract
Spoken Language Understanding (SLU) aims to extract the semantic information from the speech utterance of user queries. It is a core component in a task-oriented dialogue system. With the spectacular progress of deep neural network models and the evolution of pre-trained language models, SLU has obtained significant breakthroughs. However, only a few high-resource languages have taken advantage of this progress due to the absence of SLU resources. In this paper, we seek to mitigate this obstacle by introducing SLURP-TN. This dataset was created by recording 55 native speakers uttering sentences in Tunisian dialect, manually translated from six SLURP domains. The result is an SLU Tunisian dialect dataset that comprises 4165 sentences recorded into around 5 hours of acoustic material. We also develop a number of Automatic Speech Recognition and SLU models exploiting SLUTP-TN. The Dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques
