Evaluating Logit-Based GOP Scores for Mispronunciation Detection
Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik

TL;DR
This paper compares logit-based and probability-based GOP scores for mispronunciation detection, finding that logit-based scores, especially maximum logit GOP, better align with human perception and improve assessment accuracy.
Contribution
It introduces and evaluates logit-based GOP scoring methods, demonstrating their advantages over traditional probability-based methods in pronunciation assessment.
Findings
Logit-based GOP scores outperform probability-based scores in classification accuracy.
Maximum logit GOP shows the strongest correlation with human ratings.
Hybrid GOP methods combining different scores enhance pronunciation assessment.
Abstract
Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
