STEAMROLLER: A Multi-Agent System for Inclusive Automatic Speech Recognition for People who Stutter
Ziqi Xu, Yi Liu, Yuekang Li, Ling Shi, Kailong Wang, Yongxin Zhao

TL;DR
STEAMROLLER is a multi-agent system that converts disfluent speech of people who stutter into fluent speech in real time, improving accessibility and inclusivity in voice technology.
Contribution
It introduces a novel multi-stage, multi-agent AI pipeline that effectively repairs disfluent speech and enhances ASR performance for people who stutter.
Findings
Significant reduction in word error rate (WER) on FluencyBank dataset.
High user satisfaction in a user study.
Fine-tuning ASR on repaired speech further improves accuracy.
Abstract
People who stutter (PWS) face systemic exclusion in today's voice-driven society, where access to voice assistants, authentication systems, and remote work tools increasingly depends on fluent speech. Current automatic speech recognition (ASR) systems, trained predominantly on fluent speech, fail to serve millions of PWS worldwide. We present STEAMROLLER, a real time system that transforms stuttered speech into fluent output through a novel multi-stage, multi-agent AI pipeline. Our approach addresses three critical technical challenges: (1) the difficulty of direct speech to speech conversion for disfluent input, (2) semantic distortions introduced during ASR transcription of stuttered speech, and (3) latency constraints for real time communication. STEAMROLLER employs a three stage architecture comprising ASR transcription, multi-agent text repair, and speech synthesis, where our core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStuttering Research and Treatment · Speech Recognition and Synthesis · Phonetics and Phonology Research
