Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation   Assessment

Heejin Do; Wonjun Lee; Gary Geunbae Lee

arXiv:2406.15723·cs.CL·June 25, 2024

Acoustic Feature Mixup for Balanced Multi-aspect Pronunciation Assessment

Heejin Do, Wonjun Lee, Gary Geunbae Lee

PDF

Open Access

TL;DR

This paper introduces Acoustic Feature Mixup strategies to improve multi-aspect pronunciation assessment by addressing data scarcity and score imbalance, leading to better scoring accuracy and error detection.

Contribution

It proposes novel mixup methods tailored for pronunciation assessment and integrates error-rate features for enhanced performance.

Findings

01

Improved scoring accuracy on speechocean762 dataset.

02

Enhanced ability to predict unseen pronunciation distortions.

03

Effective handling of data scarcity and score imbalance.

Abstract

In automated pronunciation assessment, recent emphasis progressively lies on evaluating multiple aspects to provide enriched feedback. However, acquiring multi-aspect-score labeled data for non-native language learners' speech poses challenges; moreover, it often leads to score-imbalanced distributions. In this paper, we propose two Acoustic Feature Mixup strategies, linearly and non-linearly interpolating with the in-batch averaged feature, to address data scarcity and score-label imbalances. Primarily using goodness-of-pronunciation as an acoustic feature, we tailor mixup designs to suit pronunciation assessment. Further, we integrate fine-grained error-rate features by comparing speech recognition results with the original answer phonemes, giving direct hints for mispronunciation. Effective mixing of the acoustic features notably enhances overall scoring performances on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research

MethodsMixup