CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar, Preethi Jyothi, Pushpak Bhattacharyya

TL;DR
This paper introduces COSTA, an end-to-end model for translating code-switched Indian speech to English text, leveraging pretrained ASR and MT modules with an aligned interleaving scheme, and demonstrates significant improvements over baselines.
Contribution
The paper proposes COSTA, a novel architecture that combines pretrained ASR and MT modules with aligned speech-text interleaving for code-switched speech translation.
Findings
COSTA outperforms baselines by up to 3.5 BLEU points.
Introduces a new benchmark for Indian code-switched speech translation.
Effectively leverages synthetic data for end-to-end training.
Abstract
Code-switching is a widely prevalent linguistic phenomenon in multilingual societies like India. Building speech-to-text models for code-switched speech is challenging due to limited availability of datasets. In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules (that are more widely available for many languages). Speech and ASR text representations are fused using an aligned interleaving scheme and are fed further as input to a pretrained MT module; the whole pipeline is then trained end-to-end for spoken translation using synthetically created ST data. We also release a new evaluation benchmark for code-switched Bengali-English, Hindi-English, Marathi-English and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
MethodsFocus
