Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition
Muhammad Umar Farooq, Thomas Hain

TL;DR
This paper introduces a data-driven method to analyze cross-lingual acoustic-phonetic similarities in multilingual speech recognition, revealing insights into language transferability and improving model fusion performance.
Contribution
It proposes a novel posterior transformation technique to measure language similarities and enhances multilingual ASR by effectively fusing monolingual models.
Findings
Language similarity is not solely determined by phoneme overlap.
Languages with less phoneme overlap can transfer better in multilingual ASR.
Fusion of monolingual models yields approximately 8% relative performance gain.
Abstract
Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but suffer degradation in performance across several languages relative to their monolingual counterparts. Limited studies have focused on understanding the languages behaviour in the multilingual speech recognition setups. In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal. Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form. The analysis observes that the languages closeness can not be truly estimated by the volume of overlapping phonemes set. Entropy analysis of the proposed mapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
