Should Top-Down Clustering Affect Boundaries in Unsupervised Word Discovery?
Simon Malan, Benjamin van Niekerk, and Herman Kamper

TL;DR
This study compares top-down and bottom-up approaches for unsupervised speech segmentation and clustering, finding that simple bottom-up methods can perform as well as more complex top-down systems, with faster processing.
Contribution
The paper provides a comparative analysis of top-down and bottom-up segmentation methods, highlighting the effectiveness of simple bottom-up strategies and identifying clustering as a key bottleneck.
Findings
Both approaches achieve state-of-the-art results on ZeroSpeech benchmarks.
The bottom-up method is nearly five times faster than the top-down approach.
Clustering remains the main limiting factor in unsupervised word discovery.
Abstract
We investigate the problem of segmenting unlabeled speech into word-like units and clustering these to create a lexicon. Prior work can be categorized into two frameworks. Bottom-up methods first determine boundaries and then cluster the fixed segmented words into a lexicon. In contrast, top-down methods incorporate information from the clustered words to inform boundary selection. However, it is unclear whether top-down information is necessary to improve segmentation. To explore this, we look at two similar approaches that differ in whether top-down clustering informs boundary selection. Our simple bottom-up strategy predicts word boundaries using the dissimilarity between adjacent self-supervised features, then clusters the resulting segments to construct a lexicon. Our top-down system is an updated version of the ES-KMeans dynamic programming method that iteratively uses K-means to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
