Distinctive Feature Codec: An Adaptive Efficient Speech Representation for Depression Detection

Xiangyu Zhang; Fuming Fang; Peng Gao; Bin Qin; Beena Ahmed; Julien Epps

arXiv:2505.18516·eess.AS·December 30, 2025

Distinctive Feature Codec: An Adaptive Efficient Speech Representation for Depression Detection

Xiangyu Zhang, Fuming Fang, Peng Gao, Bin Qin, Beena Ahmed, Julien Epps

PDF

Open Access

TL;DR

This paper introduces the Distinctive Feature Codec (DFC), an adaptive speech representation method that preserves temporal dynamics crucial for depression detection, integrating linguistic features into deep learning models for improved interpretability.

Contribution

The work is the first to incorporate traditional distinctive linguistic features into a deep learning speech codec for depression detection, addressing the limitations of fixed-interval processing.

Findings

01

DFC effectively captures temporal dynamics relevant to depression.

02

The proposed GSQ method stabilizes quantization of variable-length segments.

03

Results show improved interpretability and detection performance.

Abstract

Large Language Models (LLMs) have demonstrated remarkable success across diverse fields, establishing a powerful paradigm for complex information processing. This has inspired the integration of speech into LLM frameworks, often by tokenizing continuous audio via neural speech codecs, enabling powerful speech language models. However, this dominant tokenization strategy relies on uniform frame-based processing at fixed time intervals. This fixed-rate approach, while effective for linguistic content, destroys the temporal dynamics. These dynamics are not noise but are established as primary biomarkers in clinical applications such as depression detection. To address this gap, we introduce the Distinctive Feature Codec (DFC), an adaptive framework engineered to preserve this vital timing information. Drawing from linguistic theory, DFC abandons fixed-interval processing and instead learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Speech and Audio Processing