Phoneme Segmentation Using Self-Supervised Speech Models

Luke Strgar; David Harwath

arXiv:2211.01461·eess.AS·November 4, 2022

Phoneme Segmentation Using Self-Supervised Speech Models

Luke Strgar, David Harwath

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that self-supervised speech models can be effectively transferred to phoneme segmentation tasks, outperforming previous methods in supervised and unsupervised settings on standard datasets.

Contribution

It introduces a transformer-based model with convolutional enhancements that leverages self-supervised representations for phoneme segmentation, and clarifies evaluation metric ambiguities.

Findings

01

Model surpasses state-of-the-art in supervised and unsupervised settings

02

Effective use of self-supervised pre-trained features

03

Clarification of evaluation metric definitions

Abstract

We apply transfer learning to the task of phoneme segmentation and demonstrate the utility of representations learned in self-supervised pre-training for the task. Our model extends transformer-style encoders with strategically placed convolutions that manipulate features learned in pre-training. Using the TIMIT and Buckeye corpora we train and test the model in the supervised and unsupervised settings. The latter case is accomplished by furnishing a noisy label-set with the predictions of a separate model, it having been trained in an unsupervised fashion. Results indicate our model eclipses previous state-of-the-art performance in both settings and on both datasets. Finally, following observations during published code review and attempts to reproduce past segmentation results, we find a need to disambiguate the definition and implementation of widely-used evaluation metrics. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lstrgar/self-supervised-phone-segmentation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsTest