Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies

Carlos Mena; Pol Serra; Jacobo Romero; Abir Messaoudi; Jose Giraldo; Carme Armentano-Oller; Rodolfo Zevallos; Ivan Meza; Javier Hernando

arXiv:2507.13875·cs.CL·July 21, 2025

Optimizing ASR for Catalan-Spanish Code-Switching: A Comparative Analysis of Methodologies

Carlos Mena, Pol Serra, Jacobo Romero, Abir Messaoudi, Jose Giraldo, Carme Armentano-Oller, Rodolfo Zevallos, Ivan Meza, Javier Hernando

PDF

Open Access 1 Datasets

TL;DR

This paper enhances Catalan-Spanish code-switching speech recognition by comparing data augmentation and model fine-tuning strategies, demonstrating improved transcription accuracy with synthetic data and language tokens.

Contribution

It introduces and evaluates three novel strategies for improving ASR in code-switching scenarios, including synthetic data generation and language token integration.

Findings

01

Combining synthetic CS data with language tokens improves ASR accuracy.

02

Fine-tuned Whisper models outperform baseline models on CS speech.

03

Open-source models are made available on Hugging Face.

Abstract

Code-switching (CS), the alternating use of two or more languages, challenges automatic speech recognition (ASR) due to scarce training data and linguistic similarities. The lack of dedicated CS datasets limits ASR performance, as most models rely on monolingual or mixed-language corpora that fail to reflect real-world CS patterns. This issue is critical in multilingual societies where CS occurs in informal and formal settings. A key example is Catalan-Spanish CS, widely used in media and parliamentary speeches. In this work, we improve ASR for Catalan-Spanish CS by exploring three strategies: (1) generating synthetic CS data, (2) concatenating monolingual audio, and (3) leveraging real CS data with language tokens. We extract CS data from Catalan speech corpora and fine-tune OpenAI's Whisper models, making them available on Hugging Face. Results show that combining a modest amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

BSC-LT/BSCs_Code_Switching_CA-ES_ASR_Test
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing