Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases

Ishaan Mahapatra; Nihar R. Mahapatra

arXiv:2508.14089·cs.SD·August 21, 2025

Systematic FAIRness Assessment of Open Voice Biomarker Datasets for Mental Health and Neurodegenerative Diseases

Ishaan Mahapatra, Nihar R. Mahapatra

PDF

Open Access

TL;DR

This study systematically evaluates the FAIRness of 27 open voice biomarker datasets for mental health and neurodegenerative diseases, identifying strengths and weaknesses to guide improvements for clinical adoption.

Contribution

First comprehensive FAIR assessment of voice biomarker datasets, providing actionable recommendations to enhance data quality and usability for clinical research.

Findings

01

High findability across datasets

02

Significant variability in accessibility and interoperability

03

Repository choice impacts FAIRness scores

Abstract

Voice biomarkers--human-generated acoustic signals such as speech, coughing, and breathing--are promising tools for scalable, non-invasive detection and monitoring of mental health and neurodegenerative diseases. Yet, their clinical adoption remains constrained by inconsistent quality and limited usability of publicly available datasets. To address this gap, we present the first systematic FAIR (Findable, Accessible, Interoperable, Reusable) evaluation of 27 publicly available voice biomarker datasets focused on these disease areas. Using the FAIR Data Maturity Model and a structured, priority-weighted scoring method, we assessed FAIRness at subprinciple, principle, and composite levels. Our analysis revealed consistently high Findability but substantial variability and weaknesses in Accessibility, Interoperability, and Reusability. Mental health datasets exhibited greater variability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging