Automatic Disfluency Detection from Untranscribed Speech
Amrit Romana, Kazuhito Koishida, Emily Mower Provost

TL;DR
This paper explores various language, acoustic, and multimodal methods for automatic frame-level disfluency detection from audio, highlighting the superiority of acoustic and multimodal approaches over transcript-based methods, with implications for clinical and NLP applications.
Contribution
It introduces novel acoustic and multimodal approaches for disfluency detection that outperform traditional transcript-based methods, emphasizing the importance of audio quality.
Findings
Acoustic-based disfluency detection outperforms language-based methods.
Transcript quality significantly impacts detection performance.
Multimodal architectures improve disfluency detection accuracy.
Abstract
Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. Stuttering is a speech disorder characterized by a high rate of disfluencies, but all individuals speak with some disfluencies and the rates of disfluencies may by increased by factors such as cognitive load. Clinically, automatic disfluency detection may help in treatment planning for individuals who stutter. Outside of the clinic, automatic disfluency detection may serve as a pre-processing step to improve natural language understanding in downstream applications. With this wide range of applications in mind, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStuttering Research and Treatment · Phonetics and Phonology Research · Speech Recognition and Synthesis
