MMFformer: Multimodal Fusion Transformer Network for Depression Detection
Md Rezwanul Haque, Md. Milon Islam, S M Taslim Uddin Raju, Hamdi Altaheri, Lobna Nassar, Fakhri Karray

TL;DR
MMFformer is a novel multimodal transformer network that effectively detects depression from social media data by capturing spatial and temporal features across multiple modalities, outperforming existing methods.
Contribution
The paper introduces MMFformer, a new multimodal fusion transformer architecture that enhances depression detection accuracy by integrating spatial and temporal features from videos and audio.
Findings
Surpasses state-of-the-art with 13.92% F1-Score improvement on D-Vlog dataset.
Achieves 7.74% F1-Score increase on LMVD dataset.
Effectively fuses multimodal social media data for depression analysis.
Abstract
Depression is a serious mental health illness that significantly affects an individual's well-being and quality of life, making early detection crucial for adequate care and treatment. Detecting depression is often difficult, as it is based primarily on subjective evaluations during clinical interviews. Hence, the early diagnosis of depression, thanks to the content of social networks, has become a prominent research area. The extensive and diverse nature of user-generated information poses a significant challenge, limiting the accurate extraction of relevant temporal information and the effective fusion of data across multiple modalities. This paper introduces MMFformer, a multimodal depression detection network designed to retrieve depressive spatio-temporal high-level patterns from multimodal social media information. The transformer network with residual connections captures spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Emotion and Mood Recognition · Digital Mental Health Interventions
