Improving pronunciation assessment via ordinal regression with anchored reference samples
Bin Su, Shaoguang Mao, Frank Soong, Yan Xia, Jonathan Tien, Zhiyong, Wu

TL;DR
This paper introduces a novel ordinal regression approach with anchored reference samples to improve sentence-level pronunciation assessment, achieving higher correlation with human ratings than traditional GOP methods.
Contribution
It proposes two new statistical features and a new ordinal regression framework that better exploit ranking information for more accurate pronunciation evaluation.
Findings
26.9% improvement in Pearson correlation over traditional GOP
Achieves human-parity level in pronunciation assessment
Outperforms existing GOP-based methods on Microsoft mTutor ESL Dataset
Abstract
Sentence level pronunciation assessment is important for Computer Assisted Language Learning (CALL). Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater's evaluations. In this paper, we propose two new statistical features, average GOP (aGOP) and confusion GOP (cGOP) and use them to train a binary classifier in Ordinal Regression with Anchored Reference Samples (ORARS). When the proposed approach is tested on Microsoft mTutor ESL Dataset, a relative improvement of Pearson correlation coefficient of 26.9% is obtained over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
