When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu,, Beena Ahmed, Julien Epps

TL;DR
This paper introduces a novel method that integrates acoustic landmarks into Large Language Models to improve depression detection from speech, achieving state-of-the-art results on the DAIC-WOZ dataset.
Contribution
The paper presents an innovative approach to incorporate acoustic landmarks into LLMs, enhancing multimodal depression detection from speech and text.
Findings
Achieved state-of-the-art results on DAIC-WOZ dataset.
Demonstrated the effectiveness of acoustic landmarks in depression detection.
Enhanced LLMs' ability to process speech signals for mental health analysis.
Abstract
Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the utilization of LLMs in identifying and analyzing depressive states is still relatively untapped. In this paper, we present an innovative approach to integrating acoustic speech information into the LLMs framework for multimodal depression detection. We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMental Health via Writing · Topic Modeling
