ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation   Assessment Leveraging Contrastive Ordinal Regularization

Bi-Cheng Yan; Wei-Cheng Chao; Jiun-Ting Li; Yi-Cheng Wang; Hsin-Wei; Wang; Meng-Shin Lin; Berlin Chen

arXiv:2406.02859·eess.AS·June 11, 2024·1 cites

ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei, Wang, Meng-Shin Lin, Berlin Chen

PDF

Open Access

TL;DR

This paper introduces ConPCO, a contrastive ordinal regularizer for regression-based automatic pronunciation assessment, which enhances phoneme-discriminative features by considering phoneme characteristics and ordinal relationships among proficiency scores.

Contribution

The paper proposes a novel contrastive phonemic ordinal regularizer (ConPCO) that improves phoneme-awareness in regression models for pronunciation assessment, a feature not explicitly addressed in prior work.

Findings

01

ConPCO improves phoneme discrimination in features.

02

The hierarchical model with ConPCO outperforms baselines.

03

Results demonstrate enhanced assessment accuracy on speechocean762.

Abstract

Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrastive phonemic ordinal regularizer (ConPCO) tailored for regression-based APA models to generate more phoneme-discriminative features while considering the ordinal relationships among the regression targets. The proposed ConPCO first aligns the phoneme representations of an APA model and textual embeddings of phonetic transcriptions via contrastive learning. Afterward, the phoneme characteristics are retained by regulating the distances between inter- and intra-phoneme categories in the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques

MethodsAdaptive Pseudo Augmentation