FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event   Detection

Han Jiang; Wenyu Wang; Yiquan Zhou; Hongwu Ding; Jiacheng Xu; Jihua; Zhu

arXiv:2410.05647·cs.SD·October 10, 2024

FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection

Han Jiang, Wenyu Wang, Yiquan Zhou, Hongwu Ding, Jiacheng Xu, Jihua, Zhu

PDF

Open Access

TL;DR

This paper introduces FGCL, a novel contrastive learning framework that enhances Mandarin stuttering event detection by capturing subtle acoustic nuances, leading to significant improvements in detection accuracy.

Contribution

The paper proposes a fine-grained contrastive learning approach with a new mining algorithm and stutter contrast loss for improved Mandarin stuttering detection.

Findings

01

Over 5.0% F1 score improvement on Mandarin data

02

Effective discrimination between stuttered and fluent speech

03

Enhanced detection accuracy through detailed acoustic analysis

Abstract

This paper presents the T031 team's approach to the StutteringSpeech Challenge in SLT2024. Mandarin Stuttering Event Detection (MSED) aims to detect instances of stuttering events in Mandarin speech. We propose a detailed acoustic analysis method to improve the accuracy of stutter detection by capturing subtle nuances that previous Stuttering Event Detection (SED) techniques have overlooked. To this end, we introduce the Fine-Grained Contrastive Learning (FGCL) framework for MSED. Specifically, we model the frame-level probabilities of stuttering events and introduce a mining algorithm to identify both easy and confusing frames. Then, we propose a stutter contrast loss to enhance the distinction between stuttered and fluent speech frames, thereby improving the discriminative capability of stuttered feature embeddings. Extensive evaluations on English and Mandarin datasets demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Stuttering Research and Treatment