Automatic classification of stop realisation with wav2vec2.0

James Tanner; Morgan Sonderegger; Jane Stuart-Smith; Jeff Mielke; Tyler Kendall

arXiv:2505.23688·cs.CL·June 2, 2025

Automatic classification of stop realisation with wav2vec2.0

James Tanner, Morgan Sonderegger, Jane Stuart-Smith, Jeff Mielke, Tyler Kendall

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that wav2vec2.0 models can accurately classify stop burst presence in speech, offering a scalable tool for phonetic annotation across languages and speech types.

Contribution

It introduces a method for using wav2vec2.0 to automatically classify stop realization, showing high accuracy and robustness across languages and speech conditions.

Findings

01

High classification accuracy in English and Japanese

02

Robust performance across curated and unprepared speech

03

Automatic annotations closely match manual annotations

Abstract

Modern phonetic research regularly makes use of automatic tools for the annotation of speech data, however few tools exist for the annotation of many variable phonetic phenomena. At the same time, pre-trained self-supervised models, such as wav2vec2.0, have been shown to perform well at speech classification tasks and latently encode fine-grained phonetic information. We demonstrate that wav2vec2.0 models can be trained to automatically classify stop burst presence with high accuracy in both English and Japanese, robust across both finely-curated and unprepared speech corpora. Patterns of variability in stop realisation are replicated with the automatic annotations, and closely follow those of manual annotations. These results demonstrate the potential of pre-trained speech models as tools for the automatic annotation and processing of speech corpus data, enabling researchers to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

james-tanner/wav2vec-burst-detection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Linguistic Variation and Morphology