Semi-supervised Learning For Robust Speech Evaluation

Huayun Zhang; Jeremy H.M. Wong; Geyu Lin; Nancy F. Chen

arXiv:2409.14666·cs.AI·September 24, 2024

Semi-supervised Learning For Robust Speech Evaluation

Huayun Zhang, Jeremy H.M. Wong, Geyu Lin, Nancy F. Chen

PDF

Open Access

TL;DR

This paper introduces a semi-supervised learning approach for robust speech evaluation that leverages unlabeled data and objective regularization to improve accuracy and consistency across proficiency levels and out-of-distribution samples.

Contribution

It proposes a novel semi-supervised framework using mutual information and an anchor model with pseudo labels to enhance speech scoring robustness.

Findings

01

Achieves high performance on a public dataset.

02

Provides more evenly distributed errors across proficiency levels.

03

Outperforms baseline methods on out-of-distribution data.

Abstract

Speech evaluation measures a learners oral proficiency using automatic models. Corpora for training such models often pose sparsity challenges given that there often is limited scored data from teachers, in addition to the score distribution across proficiency levels being often imbalanced among student cohorts. Automatic scoring is thus not robust when faced with under-represented samples or out-of-distribution samples, which inevitably exist in real-world deployment scenarios. This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization to approximate subjective evaluation criteria. In particular, normalized mutual information is used to quantify the speech characteristics from the learner and the reference. An anchor model is trained using pseudo labels to predict the correctness of pronunciation. An interpolated loss function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing