Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges
Zahraa Al Sahili, Ioannis Patras, Matthew Purver

TL;DR
This survey reviews recent advances in multimodal machine learning for mental health, highlighting datasets, models, and challenges to guide future research and clinical applications.
Contribution
It provides the first comprehensive, clinically grounded synthesis of multimodal ML approaches, datasets, and challenges in mental health.
Findings
Cataloged 26 public multimodal datasets for mental health.
Compared 28 models using transformer, graph, and hybrid fusion strategies.
Identified key challenges like data privacy, fairness, and explainability.
Abstract
Multimodal machine learning (MML) is rapidly reshaping the way mental-health disorders are detected, characterized, and longitudinally monitored. Whereas early studies relied on isolated data streams -- such as speech, text, or wearable signals -- recent research has converged on architectures that integrate heterogeneous modalities to capture the rich, complex signatures of psychiatric conditions. This survey provides the first comprehensive, clinically grounded synthesis of MML for mental health. We (i) catalog 26 public datasets spanning audio, visual, physiological signals, and text modalities; (ii) systematically compare transformer, graph, and hybrid-based fusion strategies across 28 models, highlighting trends in representation learning and cross-modal alignment. Beyond summarizing current capabilities, we interrogate open challenges: data governance and privacy, demographic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
