Unsupervised Speech Segmentation: A General Approach Using Speech   Language Models

Avishai Elmakies; Omri Abend; Yossi Adi

arXiv:2501.03711·cs.CL·January 8, 2025

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Avishai Elmakies, Omri Abend, Yossi Adi

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel unsupervised speech segmentation method leveraging speech language models to identify multiple acoustic-semantic style changes in speech, outperforming traditional spectral change-based methods.

Contribution

It introduces a general unsupervised approach for speech segmentation that captures diverse acoustic-semantic distinctions using speech language models, extending beyond single-style change detection.

Findings

01

Superior boundary detection compared to baselines

02

Higher segment purity achieved

03

Reduced over-segmentation

Abstract

In this paper, we introduce an unsupervised approach for Speech Segmentation, which builds on previously researched approaches, e.g., Speaker Diarization, while being applicable to an inclusive set of acoustic-semantic distinctions, paving a path towards a general Unsupervised Speech Segmentation approach. Unlike traditional speech and audio segmentation, which mainly focuses on spectral changes in the input signal, e.g., phone segmentation, our approach tries to segment the spoken utterance into chunks with differing acoustic-semantic styles, focusing on acoustic-semantic information that does not translate well into text, e.g., emotion or speaker. While most Speech Segmentation tasks only handle one style change, e.g., emotion diarization, our approach tries to handle multiple acoustic-semantic style changes. Leveraging recent advances in Speech Language Models (SLMs), we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avishaielmakies/unsupervised_speech_segmentation_using_slm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training