Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings
Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

TL;DR
This paper leverages pre-trained speaker and speech embeddings from large audio datasets to improve stuttering detection accuracy, demonstrating significant performance gains over traditional methods on limited datasets.
Contribution
It introduces the use of pre-trained deep speech embeddings for stuttering detection and shows how combining multiple embeddings enhances performance.
Findings
Relative UAR improvements of up to 37.9% over baselines
Combining embeddings increases UAR by up to 6.32%
Pre-trained embeddings outperform traditional classifiers on limited data
Abstract
The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Phonetics and Phonology Research · Speech Recognition and Synthesis
