The complementary roles of non-verbal cues for Robust Pronunciation Assessment
Yassine El Kheir, Shammur Absar Chowdhury, Ahmed Ali

TL;DR
This paper introduces IntraVerbalPA, a novel pronunciation assessment framework that leverages non-verbal cues alongside traditional speech features, improving accuracy in evaluating non-native pronunciation.
Contribution
It proposes a new framework incorporating non-verbal cues and a phonemic-duration metric, enhancing pronunciation assessment beyond phonetic and phonological analysis.
Findings
IntraVerbalPA outperforms existing methods in pronunciation assessment.
Non-verbal cues significantly improve assessment accuracy.
The phonemic-duration metric effectively models duration distribution.
Abstract
Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verbal cues, alongside the conventional speech and phoneme representations. Additionally, we introduce ''Goodness of phonemic-duration'' metric to effectively model duration distribution within the framework. Our results validate the effectiveness of the proposed IntraVerbalPA framework and its individual components, yielding performance that either matches or outperforms existing research works.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research
