Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

TL;DR
This paper introduces a hierarchical attention-based transformer model for bi-modal depression detection from speech, enhancing interpretability and clinical relevance by providing detailed speech and sentence-level explanations.
Contribution
It proposes a novel bi-modal speech-level transformer that avoids segment-level labelling and offers hierarchical interpretability through gradient-weighted attention maps.
Findings
Model outperforms segment-level models with higher accuracy and F1 score.
Provides detailed sentence and token-level explanations for depression detection.
Enables clinicians to verify model predictions with interpretable insights.
Abstract
Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level (=0.854, =0.947, =0.897 compared to =0.732, =0.808, =0.768). For model interpretation, using one true positive sample, we show which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Emotion and Mood Recognition
