Stuttering Detection Using Speaker Representations and Self-supervised   Contextual Embeddings

Shakeel A. Sheikh; Md Sahidullah; Fabrice Hirsch; Slim Ouni

arXiv:2306.00689·cs.SD·June 2, 2023·1 cites

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

PDF

Open Access

TL;DR

This paper leverages pre-trained speaker and speech embeddings from large audio datasets to improve stuttering detection accuracy, demonstrating significant performance gains over traditional methods on limited datasets.

Contribution

It introduces the use of pre-trained deep speech embeddings for stuttering detection and shows how combining multiple embeddings enhances performance.

Findings

01

Relative UAR improvements of up to 37.9% over baselines

02

Combining embeddings increases UAR by up to 6.32%

03

Pre-trained embeddings outperform traditional classifiers on limited data

Abstract

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStuttering Research and Treatment · Phonetics and Phonology Research · Speech Recognition and Synthesis