3M: An Effective Multi-view, Multi-granularity, and Multi-aspect   Modeling Approach to English Pronunciation Assessment

Fu-An Chao; Tien-Hong Lo; Tzu-I Wu; Yao-Ting Sung; Berlin Chen

arXiv:2208.09110·cs.SD·September 13, 2022

3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment

Fu-An Chao, Tien-Hong Lo, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces 3M, a multi-view, multi-granularity, and multi-aspect modeling approach that integrates diverse features for improved English pronunciation assessment, especially enhancing fluency and prosody evaluation.

Contribution

The paper proposes a novel 3M approach combining multiple feature types and phonological embeddings to address granularity and data limitations in pronunciation assessment.

Findings

01

Significant improvements in assessment accuracy across multiple granularities.

02

Enhanced evaluation of speaking fluency and speech prosody.

03

Effective integration of prosodic, phonological, and self-supervised features.

Abstract

As an indispensable ingredient of computer-assisted pronunciation training (CAPT), automatic pronunciation assessment (APA) plays a pivotal role in aiding self-directed language learners by providing multi-aspect and timely feedback. However, there are at least two potential obstacles that might hinder its performance for practical use. On one hand, most of the studies focus exclusively on leveraging segmental (phonetic)-level features such as goodness of pronunciation (GOP); this, however, may cause a discrepancy of feature granularity when performing suprasegmental (prosodic)-level pronunciation assessment. On the other hand, automatic pronunciation assessments still suffer from the lack of large-scale labeled speech data of non-native speakers, which inevitably limits the performance of pronunciation assessment. In this paper, we tackle these problems by integrating multiple prosodic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques