Unsupervised Word Segmentation using K Nearest Neighbors

Tzeviya Sylvia Fuchs; Yedid Hoshen; Joseph Keshet

arXiv:2204.13094·cs.SD·April 28, 2022

Unsupervised Word Segmentation using K Nearest Neighbors

Tzeviya Sylvia Fuchs, Yedid Hoshen, Joseph Keshet

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised kNN-based method for speech word segmentation that leverages pre-trained speech representations and outperforms existing single-stage approaches while competing with two-stage methods.

Contribution

It presents a novel unsupervised approach that directly uses pre-trained audio features and compares segments with their K nearest neighbors, eliminating the need for phoneme discovery.

Findings

01

Improved results over previous single-stage methods

02

Competitive performance with state-of-the-art two-stage methods

03

Operates directly on pre-trained speech representations

Abstract

In this paper, we propose an unsupervised kNN-based approach for word segmentation in speech utterances. Our method relies on self-supervised pre-trained speech representations, and compares each audio segment of a given utterance to its K nearest neighbors within the training set. Our main assumption is that a segment containing more than one word would occur less often than a segment containing a single word. Our method does not require phoneme discovery and is able to operate directly on pre-trained audio representations. This is in contrast to current methods that use a two-stage approach; first detecting the phonemes in the utterance and then detecting word-boundaries according to statistics calculated on phoneme patterns. Experiments on two datasets demonstrate improved results over previous single-stage methods and competitive results on state-of-the-art two-stage methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlspeech/gradseg
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing