Predicting Mood Disorder Symptoms with Remotely Collected Videos Using an Interpretable Multimodal Dynamic Attention Fusion Network
Tathagata Banerjee, Matthew Kollada, Pablo Gersberg, Oscar Rodriguez,, Jane Tiller, Andrew E Jaffe, John Reynders

TL;DR
This paper presents an interpretable multimodal deep learning approach using audio, video, and text data from smartphones to accurately identify mood disorder symptoms, with an emphasis on explainability and digital markers.
Contribution
The study introduces a novel multimodal classification framework combining CNNs and transformers, applied to a large smartphone-collected dataset, with interpretability via SHAP for digital marker identification.
Findings
Outperforms existing static embedding methods in classification accuracy.
Successfully identifies important features as potential digital markers.
Demonstrates the feasibility of remote, multimodal mood disorder assessment.
Abstract
We developed a novel, interpretable multimodal classification method to identify symptoms of mood disorders viz. depression, anxiety and anhedonia using audio, video and text collected from a smartphone application. We used CNN-based unimodal encoders to learn dynamic embeddings for each modality and then combined these through a transformer encoder. We applied these methods to a novel dataset - collected by a smartphone application - on 3002 participants across up to three recording sessions. Our method demonstrated better multimodal classification performance compared to existing methods that employed static embeddings. Lastly, we used SHapley Additive exPlanations (SHAP) to prioritize important features in our model that could serve as potential digital markers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · Emotion and Mood Recognition
