Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities   on Multilingual Speech Recognition

Muhammad Umar Farooq; Thomas Hain

arXiv:2207.03390·cs.CL·July 8, 2022

Investigating the Impact of Cross-lingual Acoustic-Phonetic Similarities on Multilingual Speech Recognition

Muhammad Umar Farooq, Thomas Hain

PDF

Open Access

TL;DR

This paper introduces a data-driven method to analyze cross-lingual acoustic-phonetic similarities in multilingual speech recognition, revealing insights into language transferability and improving model fusion performance.

Contribution

It proposes a novel posterior transformation technique to measure language similarities and enhances multilingual ASR by effectively fusing monolingual models.

Findings

01

Language similarity is not solely determined by phoneme overlap.

02

Languages with less phoneme overlap can transfer better in multilingual ASR.

03

Fusion of monolingual models yields approximately 8% relative performance gain.

Abstract

Multilingual automatic speech recognition (ASR) systems mostly benefit low resource languages but suffer degradation in performance across several languages relative to their monolingual counterparts. Limited studies have focused on understanding the languages behaviour in the multilingual speech recognition setups. In this paper, a novel data-driven approach is proposed to investigate the cross-lingual acoustic-phonetic similarities. This technique measures the similarities between posterior distributions from various monolingual acoustic models against a target speech signal. Deep neural networks are trained as mapping networks to transform the distributions from different acoustic models into a directly comparable form. The analysis observes that the languages closeness can not be truly estimated by the volume of overlapping phonemes set. Entropy analysis of the proposed mapping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing