Detecting Multiple Speech Disfluencies using a Deep Residual Network   with Bidirectional Long Short-Term Memory

Tedd Kourkounakis; Amirhossein Hajavi; Ali Etemad

arXiv:1910.12590·eess.AS·October 29, 2019

Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory

Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad

PDF

TL;DR

This paper introduces a deep learning model combining residual networks and bidirectional LSTMs to detect and classify various speech disfluencies directly from acoustic features, improving accuracy over previous methods.

Contribution

It presents a novel acoustic-feature-based model for stutter detection that outperforms existing approaches, eliminating the need for speech recognition.

Findings

01

Achieved an average miss rate of 10.03%.

02

Outperformed state-of-the-art by nearly 27%.

03

Effectively classifies multiple types of stutter disfluencies.

Abstract

Stuttering is a speech impediment affecting tens of millions of people on an everyday basis. Even with its commonality, there is minimal data and research on the identification and classification of stuttered speech. This paper tackles the problem of detection and classification of different forms of stutter. As opposed to most existing works that identify stutters with language models, our work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition. Our model uses a deep residual network and bidirectional long short-term memory layers to classify different types of stutters and achieves an average miss rate of 10.03%, outperforming the state-of-the-art by almost 27%

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.