Goodness of Pronunciation Pipelines for OOV Problem
Ankit Grover

TL;DR
This paper presents pipelines for Goodness of Pronunciation that address the OOV problem at testing time by expanding vocabularies and removing unknown phonemes, improving pronunciation scoring accuracy.
Contribution
It introduces three novel pipelines—Online, Offline, and Hybrid—for OOV handling in GoP computation, integrating lexicon expansion and phoneme posterior analysis.
Findings
Effective removal of UNK and SPN phonemes improves scoring accuracy
Hybrid pipeline combines advantages of online and offline methods
Utilities provided facilitate future research in pronunciation assessment
Abstract
In the following report we propose pipelines for Goodness of Pronunciation (GoP) computation solving OOV problem at testing time using Vocab/Lexicon expansion techniques. The pipeline uses different components of ASR system to quantify accent and automatically evaluate them as scores. We use the posteriors of an ASR model trained on native English speech, along with the phone level boundaries to obtain phone level pronunciation scores. We used this as a baseline pipeline and implemented methods to remove UNK and SPN phonemes in the GoP output by building three pipelines. The Online, Offline and Hybrid pipeline which returns the scores but also can prevent unknown words in the final output. The Online method is based per utterance, Offline method pre-incorporates a set of OOV words for a given data set and the Hybrid method combines the above two ideas to expand the lexicon as well work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
