Investigating the role of L1 in automatic pronunciation evaluation of L2 speech
Ming Tu, Anna Grabek, Julie Liss, Visar Berisha

TL;DR
This paper explores how incorporating both L1 and L2 acoustic models improves automatic pronunciation evaluation accuracy for non-native speakers, enhancing correlation with human judgments.
Contribution
It introduces a novel approach using dual acoustic models and utterance-level features to better assess accentedness in second language speech.
Findings
Improved correlation with human evaluators using combined L1 and L2 models.
Effective utterance-level feature extraction scheme.
Enhanced pronunciation assessment accuracy across multiple L1 backgrounds.
Abstract
Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of non-native speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target language (L2) and the speaker's first language (L1). In this paper, we propose to use two native speech acoustic models, one trained on L2 speech and the other trained on L1 speech. We develop two sets of measurements that can be extracted from two acoustic models given accented speech. A new utterance-level feature extraction scheme is used to convert these measurements into a fixed-dimension vector which is used as an input to a statistical model to predict the accentedness of a speaker. On a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research
