Towards Efficient and Multifaceted Computer-assisted Pronunciation Training Leveraging Hierarchical Selective State Space Model and Decoupled Cross-entropy Loss
Fu-An Chao, Berlin Chen

TL;DR
This paper presents HMamba, a unified CAPT system that combines automatic pronunciation assessment and mispronunciation detection using a hierarchical model and a novel loss function, achieving improved accuracy and efficiency.
Contribution
The work introduces HMamba, a novel integrated approach for CAPT that combines APA and MDD tasks with a decoupled loss function for better performance.
Findings
Effective on speechocean762 benchmark dataset.
Improves MDD F1-score to 63.85%.
Demonstrates efficiency and multifaceted capabilities.
Abstract
Prior efforts in building computer-assisted pronunciation training (CAPT) systems often treat automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD) as separate fronts: the former aims to provide multiple pronunciation aspect scores across diverse linguistic levels, while the latter focuses instead on pinpointing the precise phonetic pronunciation errors made by non-native language learners. However, it is generally expected that a full-fledged CAPT system should perform both functionalities simultaneously and efficiently. In response to this surging demand, we in this work first propose HMamba, a novel CAPT approach that seamlessly integrates APA and MDD tasks in parallel. In addition, we introduce a novel loss function, decoupled cross-entropy loss (deXent), specifically tailored for MDD to facilitate better-supervised learning for detecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSparse Evolutionary Training · Adaptive Pseudo Augmentation
