Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling

Tahiya Chowdhury; Veronica Romero

arXiv:2506.11072·eess.AS·June 16, 2025

Can We Trust Machine Learning? The Reliability of Features from Open-Source Speech Analysis Tools for Speech Modeling

Tahiya Chowdhury, Veronica Romero

PDF

Open Access

TL;DR

This paper evaluates the reliability of open-source speech analysis tools, OpenSMILE and Praat, in extracting features for machine learning models applied to adolescents with autism, highlighting variability and potential biases.

Contribution

It provides an empirical assessment of the reliability of popular speech features across different tools and populations, emphasizing the need for domain-specific validation.

Findings

01

Significant variation in features across tools affects model performance.

02

Tool reliability varies across demographic groups and contexts.

03

Verification is essential for clinical and fair machine learning applications.

Abstract

Machine learning-based behavioral models rely on features extracted from audio-visual recordings. The recordings are processed using open-source tools to extract speech features for classification models. These tools often lack validation to ensure reliability in capturing behaviorally relevant information. This gap raises concerns about reproducibility and fairness across diverse populations and contexts. Speech processing tools, when used outside of their design context, can fail to capture behavioral variations equitably and can then contribute to bias. We evaluate speech features extracted from two widely used speech analysis tools, OpenSMILE and Praat, to assess their reliability when considering adolescents with autism. We observed considerable variation in features across tools, which influenced model performance across context and demographic groups. We encourage domain-relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis