Temporal envelope and fine structure cues for dysarthric speech detection using CNNs
Ina Kodrasi

TL;DR
This paper introduces a novel approach for dysarthric speech detection by separately analyzing temporal envelope and fine structure cues inspired by human auditory processing, leading to improved detection accuracy.
Contribution
It proposes a new method that factors speech signals into envelope and fine structure for CNN-based dysarthric speech detection, outperforming traditional magnitude spectrum methods.
Findings
Processing both envelope and fine structure improves detection accuracy.
Separately learned representations outperform single-cue approaches.
The method aligns with human auditory perception mechanisms.
Abstract
Deep learning-based techniques for automatic dysarthric speech detection have recently attracted interest in the research community. State-of-the-art techniques typically learn neurotypical and dysarthric discriminative representations by processing time-frequency input representations such as the magnitude spectrum of the short-time Fourier transform (STFT). Although these techniques are expected to leverage perceptual dysarthric cues, representations such as the magnitude spectrum of the STFT do not necessarily convey perceptual aspects of complex sounds. Inspired by the temporal processing mechanisms of the human auditory system, in this paper we factor signals into the product of a slowly varying envelope and a rapidly varying fine structure. Separately exploiting the different perceptual cues present in the envelope (i.e., phonetic information, stress, and voicing) and fine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
