Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
Xuanru Zhou, Cheol Jun Cho, Ayati Sharma, Brittany Morin, David, Baquirin, Jet Vonk, Zoe Ezzes, Zachary Miller, Boon Lead Tee, Maria Luisa, Gorno Tempini, Jiachen Lian, Gopala Anumanchipalli

TL;DR
Stutter-Solver is an end-to-end multi-lingual framework inspired by YOLO that detects dysfluencies with high accuracy, handling various types and languages, and is supported by new large-scale dysfluency datasets.
Contribution
It introduces a novel YOLO-inspired end-to-end dysfluency detection model and three large-scale dysfluency corpora for multiple languages.
Findings
Achieves state-of-the-art performance on dysfluency detection datasets.
Effectively handles co-dysfluencies and multiple languages.
Provides open-source code and datasets for further research.
Abstract
Current de-facto dysfluency modeling methods utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Phonetics and Phonology Research
