# Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice

**Authors:** Ran Piao, Yuan Lu, Hareld Kemps, Tong Xia, Aaqib Saeed

arXiv: 2508.20717 · 2025-12-22

## TL;DR

This paper introduces MARVEL, a multitask learning framework that uses acoustic features to simultaneously detect nine neurological and respiratory disorders from voice, achieving high accuracy and outperforming existing models.

## Contribution

The paper presents a novel unified multitask acoustic representation model for multi-condition voice-based health screening, leveraging cross-condition knowledge transfer and outperforming state-of-the-art methods.

## Key findings

- Achieves an overall AUROC of 0.78 across nine disorders.
- Performs exceptionally well on neurological disorders with AUROC up to 0.97.
- Outperforms single-modal baselines and state-of-the-art models on most tasks.

## Abstract

Voice-based health assessment offers unprecedented opportunities for scalable, non-invasive disease screening, yet existing approaches typically focus on single conditions and fail to leverage the rich, multi-faceted information embedded in speech. We present MARVEL (Multi-task Acoustic Representations for Voice-based Health Analysis), a privacy-conscious multitask learning framework that simultaneously detects nine distinct neurological, respiratory, and voice disorders using only derived acoustic features, eliminating the need for raw audio transmission. Our dual-branch architecture employs specialized encoders with task-specific heads sharing a common acoustic backbone, enabling effective cross-condition knowledge transfer. Evaluated on the large-scale Bridge2AI-Voice v2.0 dataset, MARVEL achieves an overall AUROC of 0.78, with exceptional performance on neurological disorders (AUROC = 0.89), particularly for Alzheimer's disease/mild cognitive impairment (AUROC = 0.97). Our framework consistently outperforms single-modal baselines by 5-19% and surpasses state-of-the-art self-supervised models on 7 of 9 tasks, while correlation analysis reveals that the learned representations exhibit meaningful similarities with established acoustic features, indicating that the model's internal representations are consistent with clinically recognized acoustic patterns. By demonstrating that a single unified model can effectively screen for diverse conditions, this work establishes a foundation for deployable voice-based diagnostics in resource-constrained and remote healthcare settings.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20717/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20717/full.md

---
Source: https://tomesphere.com/paper/2508.20717