Blind phoneme segmentation with temporal prediction errors

Paul Michel; Okko R\"as\"anen; Roland Thiolli\`ere; Emmanuel Dupoux

arXiv:1608.00508·cs.CL·May 30, 2017

Blind phoneme segmentation with temporal prediction errors

Paul Michel, Okko R\"as\"anen, Roland Thiolli\`ere, Emmanuel Dupoux

PDF

TL;DR

This paper introduces an unsupervised method for phoneme segmentation that leverages prediction errors from sequence models to identify speech boundaries, showing promising results on the TIMIT dataset.

Contribution

It presents a novel approach using error profiles from sequence prediction models for unsupervised phoneme segmentation, which improves over similar existing methods.

Findings

01

Effective boundary detection via local maxima in prediction error

02

Improved segmentation accuracy on TIMIT dataset

03

Unsupervised approach reduces need for labeled data

Abstract

Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural network. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.