A segmental framework for fully-unsupervised large-vocabulary speech   recognition

Herman Kamper; Aren Jansen; Sharon Goldwater

arXiv:1606.06950·cs.CL·September 19, 2017

A segmental framework for fully-unsupervised large-vocabulary speech recognition

Herman Kamper, Aren Jansen, Sharon Goldwater

PDF

5 Repos

TL;DR

This paper introduces a novel segmental Bayesian framework for fully-unsupervised large-vocabulary speech recognition, applying it to multi-speaker data and comparing it to state-of-the-art baselines, despite high error rates.

Contribution

It presents the first large-vocabulary multi-speaker unsupervised speech recognition system using segmental acoustic embeddings and a Bayesian model, improving segmentation and clustering quality.

Findings

01

Outperforms bottom-up syllable-based approaches in segmentation and clustering.

02

Achieves high word error rates (~70-95%), highlighting task difficulty.

03

Discovered clusters have greater coverage but lower purity than term discovery systems.

Abstract

Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units---effectively performing unsupervised speech recognition. This article presents the first attempt we are aware of to apply such a system to large-vocabulary multi-speaker data. Our system uses a Bayesian modelling framework with segmental word representations: each word segment is represented as a fixed-dimensional acoustic embedding obtained by mapping the sequence of feature frames to a single embedding vector. We compare our system on English and Xitsonga datasets to state-of-the-art baselines, using a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.