DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis

Yuhua Wen; Qifei Li; Yingying Zhou; Yingming Gao; Zhengqi Wen; Jianhua Tao; Ya Li

arXiv:2512.05515·cs.CV·December 8, 2025

DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis

Yuhua Wen, Qifei Li, Yingying Zhou, Yingming Gao, Zhengqi Wen, Jianhua Tao, Ya Li

PDF

Open Access

TL;DR

DashFusion introduces a dual-stream alignment and hierarchical bottleneck fusion framework for multimodal sentiment analysis, effectively synchronizing and integrating text, image, and audio modalities to achieve state-of-the-art results.

Contribution

The paper presents a novel framework combining dual-stream alignment with hierarchical bottleneck fusion, addressing alignment and fusion challenges in multimodal sentiment analysis.

Findings

01

Achieves state-of-the-art performance on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets.

02

Effective synchronization of modalities through temporal and semantic alignment.

03

Hierarchical bottleneck fusion balances performance and computational efficiency.

Abstract

Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is challenged by alignment and fusion issues. Alignment requires synchronizing both temporal and semantic information across modalities, while fusion involves integrating these aligned features into a unified representation. Existing methods often address alignment or fusion in isolation, leading to limitations in performance and efficiency. To tackle these issues, we propose a novel framework called Dual-stream Alignment with Hierarchical Bottleneck Fusion (DashFusion). Firstly, dual-stream alignment module synchronizes multimodal features through temporal and semantic alignment. Temporal alignment employs cross-modal attention to establish frame-level correspondences among multimodal sequences. Semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Multimodal Machine Learning Applications