Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech
Xu Li, Xixin Wu, Xunying Liu, Helen Meng

TL;DR
This paper introduces a novel method using deep segmental phonetic posterior-grams to identify non-categorical speech segments in L2 English, enhancing mispronunciation detection by capturing non-categorical errors.
Contribution
It proposes a new approach to model non-categorical speech errors using SPPGs, extending beyond traditional categorical error detection in L2 speech analysis.
Findings
Non-categorical pattern detection improved confusion degree by 7.3% and 7.5%.
Explored non-categories are more accurate than baseline.
Preliminary analysis of non-categories' causes.
Abstract
Second language (L2) speech is often labeled with the native, phone categories. However, in many cases, it is difficult to decide on a categorical phone that an L2 segment belongs to. These segments are regarded as non-categories. Most existing approaches for Mispronunciation Detection and Diagnosis (MDD) are only concerned with categorical errors, i.e. a phone category is inserted, deleted or substituted by another. However, non-categorical errors are not considered. To model these non-categorical errors, this work aims at exploring non-categorical patterns to extend the categorical phone set. We apply a phonetic segment classifier to generate segmental phonetic posterior-grams (SPPGs) to represent phone segment-level information. And then we explore the non-categories by looking for the SPPGs with more than one peak. Compared with the baseline system, this approach explores more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
