Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

Tobias Bystrich; Julia M. Pritzen; Christoph A. Schmidt; Claudia Wich-Reif

arXiv:2604.27204·cs.CL·May 1, 2026

Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

Tobias Bystrich, Julia M. Pritzen, Christoph A. Schmidt, Claudia Wich-Reif

PDF

TL;DR

This paper introduces Selective Augmentation, a bootstrapping method that enhances universal automatic phonetic transcription by transferring phonetic distinctions across languages, demonstrated with improvements in voicing and aspiration recognition.

Contribution

The paper presents a novel selective augmentation technique that leverages helper languages to improve phonetic feature accuracy in APT models, addressing data scarcity issues.

Findings

01

Voicing accuracy increased by 17.6% with fewer false positives.

02

Aspiration recognition enabled 61.2% transcriptions of aspirated plosives in German.

03

Tenuis class was reduced by 32.2%, decreasing phonetic conflations.

Abstract

In the field of universal automatic phonetic transcription (APT), clean and diverse training transcriptions are required. However, such high-quality data is limited. We propose the bootstrapping approach Selective Augmentation to improve the available training transcriptions by selectively transferring distinctions between languages. Based on the model MultIPA, we exemplarily show that we could increase the accuracy of an existing feature (plosive voicing) and add a new feature (plosive aspiration) by augmenting the existing training data using information from a separate helper language (Hindi). We describe intrinsic challenges of the evaluation and develop objective metrics to determine the success: Voicing accuracy was increased by 17.6% by reducing the number of false positives. Additionally, aspiration recognition was introduced: While the baseline transcribed 0% of German /p, t,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.