Non-Linear Pairwise Language Mappings for Low-Resource Multilingual   Acoustic Model Fusion

Muhammad Umar Farooq; Darshan Adiga Haniya Narayana; Thomas Hain

arXiv:2207.03391·cs.CL·July 8, 2022

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Muhammad Umar Farooq, Darshan Adiga Haniya Narayana, Thomas Hain

PDF

Open Access

TL;DR

This paper introduces a novel hybrid DNN-HMM fusion method for low-resource multilingual speech recognition, using neural networks to transform posteriors between languages, improving recognition accuracy significantly.

Contribution

It proposes a new approach for multilingual acoustic model fusion using neural networks to transform posteriors, enhancing low-resource speech recognition without extensive data.

Findings

01

Posterior fusion improves recognition accuracy by 14.65% and 6.5%.

02

Neural network transformations require limited data.

03

Cross-lingual fusion achieves comparable results without language-specific posteriors.

Abstract

Multilingual speech recognition has drawn significant attention as an effective way to compensate data scarcity for low-resource languages. End-to-end (e2e) modelling is preferred over conventional hybrid systems, mainly because of no lexicon requirement. However, hybrid DNN-HMMs still outperform e2e models in limited data scenarios. Furthermore, the problem of manual lexicon creation has been alleviated by publicly available trained models of grapheme-to-phoneme (G2P) and text to IPA transliteration for a lot of languages. In this paper, a novel approach of hybrid DNN-HMM acoustic models fusion is proposed in a multilingual setup for the low-resource languages. Posterior distributions from different monolingual acoustic models, against a target language speech signal, are fused together. A separate regression neural network is trained for each source-target language pair to transform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing