German Phoneme Recognition with Text-to-Phoneme Data Augmentation

Dojun Park; Seohyun Park

arXiv:2211.13776·cs.CL·November 28, 2022

German Phoneme Recognition with Text-to-Phoneme Data Augmentation

Dojun Park, Seohyun Park

PDF

Open Access

TL;DR

This paper investigates how adding frequent phoneme bigrams to training data affects German phoneme recognition, revealing both improvements and declines in model performance depending on the bigram types used.

Contribution

It introduces a text-to-phoneme data augmentation method with phoneme bigrams and analyzes their impact on recognition accuracy.

Findings

01

Vowel30 and const20 models improved BLEU scores by over 1 point.

02

Total30 model's BLEU score decreased by more than 20 points.

03

Error analysis identified common recurring mistakes in the models.

Abstract

In this study, we experimented to examine the effect of adding the most frequent n phoneme bigrams to the basic vocabulary on the German phoneme recognition model using the text-to-phoneme data augmentation strategy. As a result, compared to the baseline model, the vowel30 model and the const20 model showed an increased BLEU score of more than 1 point, and the total30 model showed a significant decrease in the BLEU score of more than 20 points, showing that the phoneme bigrams could have a positive or negative effect on the model performance. In addition, we identified the types of errors that the models repeatedly showed through error analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis