Discovering Phonetic Inventories with Crosslingual Automatic Speech   Recognition

Piotr \.Zelasko; Siyuan Feng; Laureano Moro Velazquez; Ali Abavisani,; Saurabhchand Bhati; Odette Scharenborg; Mark Hasegawa-Johnson; Najim Dehak

arXiv:2201.11207·cs.SD·January 31, 2022

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

Piotr \.Zelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani,, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

PDF

Open Access 1 Repo

TL;DR

This paper explores methods for automatically discovering phonetic inventories of unseen languages using crosslingual ASR, analyzing transferability, and proposing unsupervised approaches to improve low-resource language speech recognition.

Contribution

It introduces new methods for unsupervised phonetic inventory creation for unseen languages and analyzes factors affecting phone recognition transferability.

Findings

01

Universal phone tokens are recognized across languages.

02

Unique sounds and tone languages pose challenges.

03

Crosslingual transfer improves phonetic inventory accuracy.

Abstract

The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pzelasko/kaldi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and dialogue systems