Boosting End-to-End Multilingual Phoneme Recognition through Exploiting   Universal Speech Attributes Constraints

Hao Yen; Sabato Marco Siniscalchi; Chin-Hui Lee

arXiv:2309.08828·eess.AS·September 19, 2023

Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper introduces a multilingual end-to-end speech recognition model that incorporates universal speech attributes, improving phoneme recognition accuracy across multiple languages by constraining predictions with articulatory knowledge.

Contribution

It presents a novel approach that integrates universal speech attributes into multilingual ASR, enhancing phoneme recognition and consistency across languages.

Findings

01

Outperforms conventional multilingual models with 6.85% relative improvement.

02

Achieves better performance than monolingual models.

03

Eliminates phoneme predictions inconsistent with articulatory attributes.

Abstract

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme mapping matrices are constructed based on the predefined set of universal attribute inventory, which projects the knowledge-rich articulatory attribute logits, into output phoneme logits. The mapping puts knowledge-based constraints to limit inconsistency with acoustic-phonetic evidence in the integrated prediction. Combined with phoneme recognition, our phone recognizer is able to infer from both attribute and phoneme information. The proposed joint multilingual model is evaluated through phoneme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing