TL;DR
This paper presents a speech recognition-based method using PocketSphinx features and machine learning to improve the assessment of spoken English intelligibility, significantly outperforming previous approaches.
Contribution
It introduces a novel feature extraction and classification approach that substantially enhances the accuracy of intelligibility assessment in CAPT systems.
Findings
Achieved 82% agreement with crowdworker transcriptions.
Outperformed previous methods with 75% agreement.
Demonstrated effectiveness of PocketSphinx features in pronunciation assessment.
Abstract
We use automatic speech recognition to assess spoken English learner pronunciation based on the authentic intelligibility of the learners' spoken responses determined from support vector machine (SVM) classifier or deep learning neural network model predictions of transcription correctness. Using numeric features produced by PocketSphinx alignment mode and many recognition passes searching for the substitution and deletion of each expected phoneme and insertion of unexpected phonemes in sequence, the SVM models achieve 82 percent agreement with the accuracy of Amazon Mechanical Turk crowdworker transcriptions, up from 75 percent reported by multiple independent researchers. Using such features with SVM classifier probability prediction models can help computer-aided pronunciation teaching (CAPT) systems provide intelligibility remediation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
