MultiPA: A Multi-task Speech Pronunciation Assessment Model for Open Response Scenarios
Yu-Wen Chen, Zhou Yu, Julia Hirschberg

TL;DR
MultiPA is a multi-task speech pronunciation assessment model that evaluates accuracy, fluency, and prosody in open response scenarios, outperforming previous models and generalizing well to new datasets.
Contribution
It introduces MultiPA, a novel multi-task learning model for comprehensive pronunciation assessment in open responses, surpassing existing methods in accuracy and generalization.
Findings
Achieved state-of-the-art performance on in-domain datasets.
Effectively generalized to out-of-domain data.
Demonstrated practical utility in real-world applications.
Abstract
Pronunciation assessment models designed for open response scenarios enable users to practice language skills in a manner similar to real-life communication. However, previous open-response pronunciation assessment models have predominantly focused on a single pronunciation task, such as sentence-level accuracy, rather than offering a comprehensive assessment in various aspects. We propose MultiPA, a Multitask Pronunciation Assessment model that provides sentence-level accuracy, fluency, prosody, and word-level accuracy assessment for open responses. We examined the correlation between different pronunciation tasks and showed the benefits of multi-task learning. Our model reached the state-of-the-art performance on existing in-domain data sets and effectively generalized to an out-of-domain dataset that we newly collected. The experimental results demonstrate the practical utility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems
