Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction
Mai Ali, Christopher Lucasius, Tanmay P. Patel, Madison Aitken, Jacob Vorstman, Peter Szatmari, Marco Battaglia, Deepa Kundur

TL;DR
This paper introduces a multimodal, multi-task, longitudinal approach using large language models to predict depression, suicidal ideation, and sleep disturbances from speech data, outperforming unimodal and single-task methods.
Contribution
It presents a novel trimodal, longitudinal multi-task learning framework for mental health prediction from speech, integrating text, acoustic, and vocal biomarkers.
Findings
Achieves 70.8% balanced accuracy on depression prediction.
Outperforms unimodal and single-task baselines.
Demonstrates the effectiveness of multimodal, multi-task, longitudinal modeling.
Abstract
Speech is a noninvasive digital phenotype that can offer valuable insights into mental health conditions, but it is often treated as a single modality. In contrast, we propose the treatment of patient speech data as a trimodal multimedia data source for depression detection. This study explores the potential of large language model-based architectures for speech-based depression prediction in a multimodal regime that integrates speech-derived text, acoustic landmarks, and vocal biomarkers. Adolescent depression presents a significant challenge and is often comorbid with multiple disorders, such as suicidal ideation and sleep disturbances. This presents an additional opportunity to integrate multi-task learning (MTL) into our study by simultaneously predicting depression, suicidal ideation, and sleep disturbances using the multimodal formulation. We also propose a longitudinal analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Emotion and Mood Recognition
