Dynamic Fusion Multimodal Network for SpeechWellness Detection

Wenqiang Sun; Han Yin; Jisheng Bai; Jianfeng Chen

arXiv:2508.18057·cs.SD·September 3, 2025

Dynamic Fusion Multimodal Network for SpeechWellness Detection

Wenqiang Sun, Han Yin, Jisheng Bai, Jianfeng Chen

PDF

Open Access

TL;DR

This paper presents a lightweight multimodal system with dynamic fusion for speechwellness detection, integrating acoustic and semantic features to improve accuracy while reducing model complexity.

Contribution

It introduces a dynamic fusion mechanism and combines time-domain, time-frequency, and semantic features in a lightweight model for better speechwellness detection.

Findings

01

Achieved 78% reduction in model parameters.

02

Improved detection accuracy by 5%.

03

Outperformed baseline models in experiments.

Abstract

Suicide is one of the leading causes of death among adolescents. Previous suicide risk prediction studies have primarily focused on either textual or acoustic information in isolation, the integration of multimodal signals, such as speech and text, offers a more comprehensive understanding of an individual's mental state. Motivated by this, and in the context of the 1st SpeechWellness detection challenge, we explore a lightweight multi-branch multimodal system based on a dynamic fusion mechanism for speechwellness detection. To address the limitation of prior approaches that rely on time-domain waveforms for acoustic analysis, our system incorporates both time-domain and time-frequency (TF) domain acoustic features, as well as semantic representations. In addition, we introduce a dynamic fusion block to adaptively integrate information from different modalities. Specifically, it applies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Emotion and Mood Recognition · Speech Recognition and Synthesis