End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge
Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

TL;DR
This paper develops self-supervised end-to-end speech embedding systems using Wav2Vec2.0 for stuttering detection, significantly outperforming baseline methods on the ComParE 2022 challenge dataset.
Contribution
It introduces a novel self-supervised approach leveraging Wav2Vec2.0 embeddings for stuttering detection, improving accuracy over existing baselines.
Findings
Self-supervised Wav2Vec2.0 embeddings improve detection accuracy.
Layer concatenation with MFCC features enhances performance.
Summing across all Wav2Vec2.0 layers surpasses baseline results.
Abstract
In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Phonetics and Phonology Research · Speech Recognition and Synthesis
MethodsTest
