Stuttering-Aware Automatic Speech Recognition for Indonesian Language

Fadhil Muhammad; Alwin Djuliansah; Adrian Aryaputra Hamzah; Kurniawati Azizah

arXiv:2601.03727·cs.CL·January 15, 2026

Stuttering-Aware Automatic Speech Recognition for Indonesian Language

Fadhil Muhammad, Alwin Djuliansah, Adrian Aryaputra Hamzah, Kurniawati Azizah

PDF

Open Access

TL;DR

This paper introduces a synthetic data augmentation method for Indonesian speech recognition systems to better handle stuttered speech, leveraging rule-based transformations, large language models, and transfer learning.

Contribution

It presents a novel synthetic data generation framework combined with transfer learning to improve recognition of stuttered speech in low-resource languages.

Findings

01

Reduced recognition errors on stuttered speech

02

Maintained performance on fluent speech segments

03

Validated synthetic data effectiveness for inclusive speech tech

Abstract

Automatic speech recognition systems have achieved remarkable performance on fluent speech but continue to degrade significantly when processing stuttered speech, a limitation that is particularly acute for low-resource languages like Indonesian where specialized datasets are virtually non-existent. To overcome this scarcity, we propose a data augmentation framework that generates synthetic stuttered audio by injecting repetitions and prolongations into fluent text through a combination of rule-based transformations and large language models followed by text-to-speech synthesis. We apply this synthetic data to fine-tune a pre-trained Indonesian Whisper model using transfer learning, enabling the architecture to adapt to dysfluent acoustic patterns without requiring large-scale real-world recordings. Our experiments demonstrate that this targeted synthetic exposure consistently reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Stuttering Research and Treatment · Speech and Audio Processing