ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei, Wang, Meng-Shin Lin, Berlin Chen

TL;DR
This paper introduces ConPCO, a contrastive ordinal regularizer for regression-based automatic pronunciation assessment, which enhances phoneme-discriminative features by considering phoneme characteristics and ordinal relationships among proficiency scores.
Contribution
The paper proposes a novel contrastive phonemic ordinal regularizer (ConPCO) that improves phoneme-awareness in regression models for pronunciation assessment, a feature not explicitly addressed in prior work.
Findings
ConPCO improves phoneme discrimination in features.
The hierarchical model with ConPCO outperforms baselines.
Results demonstrate enhanced assessment accuracy on speechocean762.
Abstract
Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrastive phonemic ordinal regularizer (ConPCO) tailored for regression-based APA models to generate more phoneme-discriminative features while considering the ordinal relationships among the regression targets. The proposed ConPCO first aligns the phoneme representations of an APA model and textual embeddings of phonetic transcriptions via contrastive learning. Afterward, the phoneme characteristics are retained by regulating the distances between inter- and intra-phoneme categories in the feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques
MethodsAdaptive Pseudo Augmentation
