SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
Xiangyu Zhang, Hexin Liu, Qiquan Zhang, Beena Ahmed, Julien Epps

TL;DR
SpeechT-RAG introduces a novel method that leverages speech timing features with retrieval-augmented generation to improve depression detection accuracy and confidence estimation without additional training.
Contribution
The paper presents SpeechT-RAG, a system that uses speech timing information for depression detection, addressing limitations of text-only approaches and enhancing trustworthiness in mental health assessment.
Findings
Outperforms traditional text-based RAG in depression detection accuracy
Provides reliable confidence scores based on speech timing features
Achieves results comparable to fine-tuned LLMs without extra training
Abstract
Large Language Models (LLMs) have been increasingly adopted for health-related tasks, yet their performance in depression detection remains limited when relying solely on text input. While Retrieval-Augmented Generation (RAG) typically enhances LLM capabilities, our experiments indicate that traditional text-based RAG systems struggle to significantly improve depression detection accuracy. This challenge stems partly from the rich depression-relevant information encoded in acoustic speech patterns information that current text-only approaches fail to capture effectively. To address this limitation, we conduct a systematic analysis of temporal speech patterns, comparing healthy individuals with those experiencing depression. Based on our findings, we introduce Speech Timing-based Retrieval-Augmented Generation, SpeechT-RAG, a novel system that leverages speech timing features for both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Weight Decay · BART · WordPiece · Layer Normalization
