That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages
Piotr \.Zelasko, Laureano Moro-Vel\'azquez, Mark Hasegawa-Johnson,, Odette Scharenborg, Najim Dehak

TL;DR
This study investigates how multilingual training influences phonetic representations in speech recognition, revealing that shared representations improve performance across languages, especially with minimal target language data, despite challenges in crosslingual transfer.
Contribution
The paper provides a comprehensive analysis of phonetic transfer in multilingual ASR, demonstrating the benefits of shared representations and the impact of limited target language data.
Findings
Multilingual training improves phonetic recognition across languages.
Adding even 10 hours of target language data significantly reduces error rates.
Unique language-specific phones benefit from multilingual training.
Abstract
Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
