HiPPO: Exploring A Novel Hierarchical Pronunciation Assessment Approach for Spoken Languages
Bi-Cheng Yan, Hsin-Wei Wang, Fu-An Chao, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen

TL;DR
This paper introduces HiPPO, a hierarchical model for automatic pronunciation assessment that effectively evaluates unscripted speech, using novel training strategies to improve accuracy and applicability in real-world language learning scenarios.
Contribution
The paper presents a new hierarchical pronunciation assessment model, HiPPO, with contrastive regularization and curriculum learning, specifically designed for unscripted speech in L2 learners.
Findings
HiPPO outperforms existing methods on Speechocean762 dataset.
Contrastive ordinal regularizer enhances score discrimination.
Curriculum learning improves assessment accuracy in unscripted speech.
Abstract
Automatic pronunciation assessment (APA) seeks to quantify a second language (L2) learner's pronunciation proficiency in a target language by offering timely and fine-grained diagnostic feedback. Most existing efforts on APA have predominantly concentrated on highly constrained reading-aloud tasks (where learners are prompted to read a reference text aloud); however, assessing pronunciation quality in unscripted speech (or free-speaking scenarios) remains relatively underexplored. In light of this, we first propose HiPPO, a hierarchical pronunciation assessment model tailored for spoken languages, which evaluates an L2 learner's oral proficiency at multiple linguistic levels based solely on the speech uttered by the learner. To improve the overall accuracy of assessment, a contrastive ordinal regularizer and a curriculum learning strategy are introduced for model training. The former…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Stuttering Research and Treatment
