Leveraging Data Collection and Unsupervised Learning for Code-switched   Tunisian Arabic Automatic Speech Recognition

Ahmed Amine Ben Abdallah; Ata Kabboudi; Amir Kanoun; Salah; Zaiem

arXiv:2309.11327·eess.AS·September 26, 2023

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

Ahmed Amine Ben Abdallah, Ata Kabboudi, Amir Kanoun, Salah, Zaiem

PDF

Open Access 3 Models 5 Datasets

TL;DR

This paper develops an ASR system for Tunisian Arabic that leverages data collection, self-supervised learning, and human evaluation to handle linguistic diversity and data scarcity, achieving state-of-the-art results.

Contribution

It introduces a comprehensive approach combining data collection, semi-supervised learning, and human evaluation for code-switched Tunisian Arabic ASR, with publicly released data and models.

Findings

01

Improved ASR performance on Tunisian dialects

02

Effective handling of code-switching among Tunisian Arabic, English, and French

03

Public release of data and models for community use

Abstract

Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets; covering different acoustic, linguistic and prosodic conditions. Finally, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling inadequacies in our testing references. Our models, allowing to transcribe audio samples in a linguistic mix involving Tunisian Arabic, English and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems