DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis
Yuhua Wen, Qifei Li, Yingying Zhou, Yingming Gao, Zhengqi Wen, Jianhua Tao, Ya Li

TL;DR
DashFusion introduces a dual-stream alignment and hierarchical bottleneck fusion framework for multimodal sentiment analysis, effectively synchronizing and integrating text, image, and audio modalities to achieve state-of-the-art results.
Contribution
The paper presents a novel framework combining dual-stream alignment with hierarchical bottleneck fusion, addressing alignment and fusion challenges in multimodal sentiment analysis.
Findings
Achieves state-of-the-art performance on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets.
Effective synchronization of modalities through temporal and semantic alignment.
Hierarchical bottleneck fusion balances performance and computational efficiency.
Abstract
Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is challenged by alignment and fusion issues. Alignment requires synchronizing both temporal and semantic information across modalities, while fusion involves integrating these aligned features into a unified representation. Existing methods often address alignment or fusion in isolation, leading to limitations in performance and efficiency. To tackle these issues, we propose a novel framework called Dual-stream Alignment with Hierarchical Bottleneck Fusion (DashFusion). Firstly, dual-stream alignment module synchronizes multimodal features through temporal and semantic alignment. Temporal alignment employs cross-modal attention to establish frame-level correspondences among multimodal sequences. Semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Multimodal Machine Learning Applications
