German Phoneme Recognition with Text-to-Phoneme Data Augmentation
Dojun Park, Seohyun Park

TL;DR
This paper investigates how adding frequent phoneme bigrams to training data affects German phoneme recognition, revealing both improvements and declines in model performance depending on the bigram types used.
Contribution
It introduces a text-to-phoneme data augmentation method with phoneme bigrams and analyzes their impact on recognition accuracy.
Findings
Vowel30 and const20 models improved BLEU scores by over 1 point.
Total30 model's BLEU score decreased by more than 20 points.
Error analysis identified common recurring mistakes in the models.
Abstract
In this study, we experimented to examine the effect of adding the most frequent n phoneme bigrams to the basic vocabulary on the German phoneme recognition model using the text-to-phoneme data augmentation strategy. As a result, compared to the baseline model, the vowel30 model and the const20 model showed an increased BLEU score of more than 1 point, and the total30 model showed a significant decrease in the BLEU score of more than 20 points, showing that the phoneme bigrams could have a positive or negative effect on the model performance. In addition, we identified the types of errors that the models repeatedly showed through error analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
