Spoken-Term Discovery using Discrete Speech Units

Benjamin van Niekerk; Julian Za\"idi; Marc-Andr\'e Carbonneau; and Herman Kamper

arXiv:2408.14390·eess.AS·August 27, 2024·Interspeech

Spoken-Term Discovery using Discrete Speech Units

Benjamin van Niekerk, Julian Za\"idi, Marc-Andr\'e Carbonneau, and Herman Kamper

PDF

Open Access 1 Repo

TL;DR

DUSTED is a novel method for zero-resource speech processing that discovers spoken terms by encoding audio into discrete units and finding repeated patterns, achieving state-of-the-art results on the ZeroSpeech Challenge.

Contribution

This paper introduces DUSTED, a new approach using self-supervised discrete units and bioinformatics-inspired algorithms for improved spoken-term discovery.

Findings

01

Achieves state-of-the-art results on ZeroSpeech Challenge

02

Finds longer word- or phrase-like patterns

03

Improves pattern coverage and consistency across speakers

Abstract

Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea with DUSTED: Discrete Unit Spoken-TErm Discovery. Leveraging self-supervised models, we encode input audio into sequences of discrete units. Next, we find repeated patterns by searching for similar unit sub-sequences, inspired by alignment algorithms from bioinformatics. Since discretization discards speaker information, DUSTED finds better matches across speakers, improving the coverage and consistency of the discovered patterns. We demonstrate these improvements on the ZeroSpeech Challenge, achieving state-of-the-art results on the spoken-term discovery track. Finally, we analyze the duration distribution of the patterns, showing that our method finds longer word- or phrase-like terms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bshall/dusted
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques