On using the UA-Speech and TORGO databases to validate automatic   dysarthric speech classification approaches

Guilherme Schu; Parvaneh Janbakhshi; Ina Kodrasi

arXiv:2211.08833·eess.AS·November 17, 2022·ICASSP·1 cites

On using the UA-Speech and TORGO databases to validate automatic dysarthric speech classification approaches

Guilherme Schu, Parvaneh Janbakhshi, Ina Kodrasi

PDF

Open Access

TL;DR

This paper reveals that the UA-Speech and TORGO databases may not reliably validate dysarthric speech classification methods because they contain environmental artifacts, leading to potential misinterpretation of results.

Contribution

The study demonstrates that current databases may bias dysarthria classification results due to environmental factors, emphasizing the need for careful data quality considerations.

Findings

01

State-of-the-art methods perform well on non-speech segments.

02

Recording environment influences classification performance.

03

Databases may contain environmental artifacts affecting results.

Abstract

Although the UA-Speech and TORGO databases of control and dysarthric speech are invaluable resources made available to the research community with the objective of developing robust automatic speech recognition systems, they have also been used to validate a considerable number of automatic dysarthric speech classification approaches. Such approaches typically rely on the underlying assumption that recordings from control and dysarthric speakers are collected in the same noiseless environment using the same recording setup. In this paper, we show that this assumption is violated for the UA-Speech and TORGO databases. Using voice activity detection to extract speech and non-speech segments, we show that the majority of state-of-the-art dysarthria classification approaches achieve the same or a considerably better performance when using the non-speech segments of these databases than when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Music and Audio Processing