Depression diagnosis from patient interviews using multimodal machine learning
Jana Weber, Marcel Weber, Juan Miguel Lopez Alcaraz

TL;DR
This study presents a multimodal machine learning approach that combines speech, language, and clinical data from patient interviews to improve the accuracy and utility of depression diagnosis, potentially aiding clinical decision-making.
Contribution
It introduces a novel multimodal framework that integrates multiple data sources for depression diagnosis, demonstrating improved accuracy and clinical utility over single-modality models.
Findings
Multimodal model achieved AUROC of 0.88.
Fused model showed good calibration and higher net clinical benefit.
Approach enhances early detection and supports clinical decision-making.
Abstract
Background: Depression is a major public health concern, affecting an estimated five percent of the global population. Early and accurate diagnosis is essential to initiate effective treatment, yet recognition remains challenging in many clinical contexts. Speech, language, and behavioral cues collected during patient interviews may provide objective markers that support clinical assessment. Methods: We developed a diagnostic approach that integrates features derived from patient interviews, including speech patterns, linguistic characteristics, and structured clinical information. Separate models were trained for each modality and subsequently combined through multimodal fusion to reflect the complexity of real-world psychiatric assessment. Model validity was assessed with established performance metrics, and further evaluated using calibration and decision-analytic approaches to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
