End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering   Sub-Challenge

Shakeel Ahmad Sheikh; Md Sahidullah; Fabrice Hirsch; Slim Ouni

arXiv:2207.10817·cs.SD·July 25, 2022·1 cites

End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge

Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, Slim Ouni

PDF

Open Access

TL;DR

This paper develops self-supervised end-to-end speech embedding systems using Wav2Vec2.0 for stuttering detection, significantly outperforming baseline methods on the ComParE 2022 challenge dataset.

Contribution

It introduces a novel self-supervised approach leveraging Wav2Vec2.0 embeddings for stuttering detection, improving accuracy over existing baselines.

Findings

01

Self-supervised Wav2Vec2.0 embeddings improve detection accuracy.

02

Layer concatenation with MFCC features enhances performance.

03

Summing across all Wav2Vec2.0 layers surpasses baseline results.

Abstract

In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStuttering Research and Treatment · Phonetics and Phonology Research · Speech Recognition and Synthesis

MethodsTest