# Exploring Machine Learning and Language Models for Multimodal Depression Detection

**Authors:** Javier Si Zhao Hong, Timothy Zoe Delaya, Sherwyn Chan Yin Kit, Pai Chet Ng, Xiaoxiao Miao

arXiv: 2508.20805 · 2025-08-29

## TL;DR

This paper evaluates various machine learning and deep learning models, including XGBoost, transformers, and LLMs, for multimodal depression detection across audio, video, and text data, providing insights into effective strategies for mental health prediction.

## Contribution

It introduces a comprehensive comparison of models for multimodal depression detection, highlighting their strengths and limitations across different data modalities.

## Key findings

- Transformers outperform XGBoost in certain modalities.
- LLMs show promise in text-based depression detection.
- Multimodal approaches improve detection accuracy.

## Abstract

This paper presents our approach to the first Multimodal Personality-Aware Depression Detection Challenge, focusing on multimodal depression detection using machine learning and deep learning models. We explore and compare the performance of XGBoost, transformer-based architectures, and large language models (LLMs) on audio, video, and text features. Our results highlight the strengths and limitations of each type of model in capturing depression-related signals across modalities, offering insights into effective multimodal representation strategies for mental health prediction.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20805/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/2508.20805/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/2508.20805/full.md

---
Source: https://tomesphere.com/paper/2508.20805