MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open   Response Scenarios

Yu-Wen Chen; Zhou Yu; Julia Hirschberg

arXiv:2308.12490·cs.CL·June 6, 2024·1 cites

MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios

Yu-Wen Chen, Zhou Yu, Julia Hirschberg

PDF

Open Access

TL;DR

MultiPA is a multi-task speech pronunciation assessment model that evaluates accuracy, fluency, and prosody in open response scenarios, outperforming previous models and generalizing well to new datasets.

Contribution

It introduces MultiPA, a novel multi-task learning model for comprehensive pronunciation assessment in open responses, surpassing existing methods in accuracy and generalization.

Findings

01

Achieved state-of-the-art performance on in-domain datasets.

02

Effectively generalized to out-of-domain data.

03

Demonstrated practical utility in real-world applications.

Abstract

Pronunciation assessment models designed for open response scenarios enable users to practice language skills in a manner similar to real-life communication. However, previous open-response pronunciation assessment models have predominantly focused on a single pronunciation task, such as sentence-level accuracy, rather than offering a comprehensive assessment in various aspects. We propose MultiPA, a Multitask Pronunciation Assessment model that provides sentence-level accuracy, fluency, prosody, and word-level accuracy assessment for open responses. We examined the correlation between different pronunciation tasks and showed the benefits of multi-task learning. Our model reached the state-of-the-art performance on existing in-domain data sets and effectively generalized to an out-of-domain dataset that we newly collected. The experimental results demonstrate the practical utility of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems