Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition
Piotr \.Zelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani,, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

TL;DR
This paper explores methods for automatically discovering phonetic inventories of unseen languages using crosslingual ASR, analyzing transferability, and proposing unsupervised approaches to improve low-resource language speech recognition.
Contribution
It introduces new methods for unsupervised phonetic inventory creation for unseen languages and analyzes factors affecting phone recognition transferability.
Findings
Universal phone tokens are recognized across languages.
Unique sounds and tone languages pose challenges.
Crosslingual transfer improves phonetic inventory accuracy.
Abstract
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and dialogue systems
