Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

Deepak Kumar; Baban Gain; Asif Ekbal

arXiv:2605.12242·cs.CL·May 13, 2026

Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

Deepak Kumar, Baban Gain, Asif Ekbal

PDF

1 Repo

TL;DR

This paper introduces a multilingual speech correction method that combines disfluency detection, instruction fine-tuning of LLMs, and contrastive learning to improve transcript fluency across Indian languages.

Contribution

It presents a novel multilingual correction pipeline integrating token-level disfluency signals with instruction tuning and contrastive learning, outperforming existing models.

Findings

01

Consistent improvements over strong baselines in Hindi, Bengali, and Marathi.

02

Detection-only strategies are insufficient for effective disfluency correction.

03

Combining token cues with instruction tuning and contrastive learning enhances speech transcript quality.

Abstract

Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left unaddressed, such disfluencies can significantly degrade the reliability of downstream systems. Most existing approaches rely on classical models that focus on identifying disfluent tokens for removal. While this strategy is effective to some extent, it often disrupts grammatical structure and semantic coherence, leading to incomplete or unnatural sentences. Recent literature explored the use of large language models (LLMs); however, these efforts have primarily focused on disfluency detection or data augmentation, rather than performing comprehensive correction. We propose a multilingual correction pipeline where a sequence tagger first marks disfluent tokens,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepak-kumar-98/Mind-the-Pause
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.