Adversarial Training For Low-Resource Disfluency Correction
Vineet Bhat, Preethi Jyothi, Pushpak Bhattacharyya

TL;DR
This paper introduces an adversarial training approach for disfluency correction in low-resource languages and speech impairment scenarios, leveraging synthetic data to improve F1 scores significantly.
Contribution
It presents the first use of adversarial training for disfluency correction, combining real and synthetic data across multiple languages and speech impairments.
Findings
Achieved 6.15 F1-score improvement over baselines
Effective in three Indian languages and English
Establishes new benchmark for disfluency correction
Abstract
Disfluencies commonly occur in conversational speech. Speech with disfluencies can result in noisy Automatic Speech Recognition (ASR) transcripts, which affects downstream tasks like machine translation. In this paper, we propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) that utilizes a small amount of labeled real disfluent data in conjunction with a large amount of unlabeled data. We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages- Bengali, Hindi, and Marathi (all from the Indo-Aryan family). Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments. We achieve an average 6.15 points improvement in F1-score over competitive baselines across all three languages mentioned. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Stuttering Research and Treatment
