Hierarchical Pronunciation Assessment with Multi-Aspect Attention
Heejin Do, Yunsu Kim, Gary Geunbae Lee

TL;DR
This paper introduces HiPAMA, a hierarchical model with multi-aspect attention for detailed pronunciation assessment across phoneme, word, and utterance levels, improving accuracy especially in challenging aspects.
Contribution
The paper presents a novel hierarchical assessment model that captures linguistic structures and cross-aspect relations, enhancing multi-granularity pronunciation evaluation.
Findings
Significant improvements on speachocean762 dataset.
Enhanced assessment accuracy for difficult aspects.
Robustness demonstrated across multiple evaluation metrics.
Abstract
Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all aspects at all granularity levels; therefore, they have difficulty in capturing the linguistic hierarchy of phoneme, word, and utterance. This limitation further leads to neglecting intimate cross-aspect relations at the same linguistic unit. In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
