Am\'elioration des Performances des Syst\`emes Automatiques de   Reconnaissance de la Parole pour la Parole Non Native

Ghazi Bouselmi (INRIA Lorraine - LORIA); Dominique Fohr (INRIA; Lorraine - LORIA); Irina Illina (INRIA Lorraine - LORIA); Jean-Paul Haton; (INRIA Lorraine - LORIA)

arXiv:0711.1038·cs.CL·November 8, 2007

Am\'elioration des Performances des Syst\`emes Automatiques de Reconnaissance de la Parole pour la Parole Non Native

Ghazi Bouselmi (INRIA Lorraine - LORIA), Dominique Fohr (INRIA, Lorraine - LORIA), Irina Illina (INRIA Lorraine - LORIA), Jean-Paul Haton, (INRIA Lorraine - LORIA)

PDF

Open Access

TL;DR

This paper introduces two novel methods to adapt automatic speech recognition systems for non-native speakers, significantly reducing error rates by modeling phoneme confusions and leveraging graphemic constraints.

Contribution

It presents new adaptation techniques combining phoneme confusion modeling and graphemic constraints to improve non-native speech recognition accuracy.

Findings

01

22.5% relative reduction in sentence error rate

02

34.5% relative reduction in word error rate

03

Effective adaptation for non-native accents

Abstract

In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques