XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres, Pablo Diego-Simon, Benoit Sagot, Emmanuel Dupoux

TL;DR
This paper introduces a method to improve unsupervised speech segmentation by fine-tuning XLS-R models with noisy word boundary labels, achieving state-of-the-art results across multiple languages and enabling zero-shot segmentation.
Contribution
The paper presents a novel fine-tuning approach for XLS-R models using noisy boundary labels, significantly enhancing unsupervised speech segmentation performance.
Findings
Achieved 130% improvement in F1 score over previous methods
Set new state-of-the-art results on five diverse language corpora
Enabled zero-shot segmentation for unseen languages
Abstract
Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging. In this work, we leverage the most recent self-supervised speech models that have proved to quickly adapt to new tasks through fine-tuning, even in low resource conditions. Taking inspiration from semi-supervised learning, we fine-tune an XLS-R model to predict word boundaries themselves produced by top-tier speech segmentation systems: DPDP, VG-HuBERT, GradSeg and DP-Parse. Once XLS-R is fine-tuned, it is used to infer new word boundary labels that are used in turn for another fine-tuning step. Our method consistently improves the performance of each system and sets a new state-of-the-art that is, on average 130% higher than the previous one as measured by the F1 score on correctly discovered word tokens on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
