Unsupervised word segmentation and lexicon discovery using acoustic word   embeddings

Herman Kamper; Aren Jansen; Sharon Goldwater

arXiv:1603.02845·cs.CL·March 10, 2016

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

Herman Kamper, Aren Jansen, Sharon Goldwater

PDF

TL;DR

This paper introduces an unsupervised Bayesian model that segments speech and discovers word groupings directly from audio, enabling tokenization without transcriptions or predefined vocabularies.

Contribution

It presents a novel acoustic embedding-based Bayesian approach for unsupervised speech segmentation and lexicon discovery, outperforming previous HMM-based methods.

Findings

01

Achieves around 20% word error rate in digit recognition

02

Outperforms previous HMM-based systems by about 10% absolute

03

Does not require pre-specified vocabulary size

Abstract

In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modelling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model that segments unlabelled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation. We report word error rates in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.