Learning to detect dysarthria from raw speech

Juliette Millet; Neil Zeghidour

arXiv:1811.11101·cs.CL·January 9, 2019·1 cites

Learning to detect dysarthria from raw speech

Juliette Millet, Neil Zeghidour

PDF

Open Access 3 Repos

TL;DR

This paper introduces a neural network that learns feature extraction, normalization, and compression directly from raw speech to improve dysarthria detection accuracy, surpassing traditional fixed features and prior learned features.

Contribution

It presents the first approach to jointly learn feature extraction, normalization, and compression directly from raw audio for speech classification tasks.

Findings

01

10% absolute accuracy improvement over fixed mel-filterbank features

02

Outperforms OpenSmile features when jointly learned from raw speech

03

Effective joint learning of multiple preprocessing steps from raw audio

Abstract

Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a normalization factor and a compression power from the raw speech, jointly with the rest of the architecture. We apply this model to dysarthria detection from sentence-level audio recordings. Starting from a strong attention-based baseline on which mel-filterbanks outperform standard low-level descriptors, we show that learning the filters or the normalization and compression improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Music and Audio Processing