Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?

Simon Malan; Benjamin van Niekerk; and Herman Kamper

arXiv:2507.19204·eess.AS·July 29, 2025

Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?

Simon Malan, Benjamin van Niekerk, and Herman Kamper

PDF

Open Access

TL;DR

This study compares top-down and bottom-up approaches for unsupervised speech segmentation and clustering, finding that simple bottom-up methods can perform as well as more complex top-down systems, with faster processing.

Contribution

The paper provides a comparative analysis of top-down and bottom-up segmentation methods, highlighting the effectiveness of simple bottom-up strategies and identifying clustering as a key bottleneck.

Findings

01

Both approaches achieve state-of-the-art results on ZeroSpeech benchmarks.

02

The bottom-up method is nearly five times faster than the top-down approach.

03

Clustering remains the main limiting factor in unsupervised word discovery.

Abstract

We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate information from the clustered words to inform boundary selection. However, it is unclear whether top-down information is necessary to improve segmentation. To explore this, we look at two similar approaches that differ in whether top-down clustering informs boundary selection. Our simple bottom-up strategy predicts word boundaries using the dissimilarity between adjacent self-supervised features, then clusters the resulting segments to construct a lexicon. Our top-down system is an updated version of the ES-KMeans dynamic programming method that iteratively uses K-means to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification