GSIFN: A Graph-Structured and Interlaced-Masked Multimodal   Transformer-based Fusion Network for Multimodal Sentiment Analysis

Yijie Jin

arXiv:2408.14809·cs.CL·December 4, 2024

GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis

Yijie Jin

PDF

Open Access 1 Repo

TL;DR

GSIFN introduces a graph-structured, interlaced-masked multimodal Transformer with a self-supervised framework, achieving superior multimodal sentiment analysis performance while reducing computational overhead.

Contribution

The paper proposes GSIFN, a novel multimodal fusion network that effectively balances representation capability and efficiency using graph-structured and interlaced-masked Transformer components.

Findings

01

Outperforms previous state-of-the-art models on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets.

02

Achieves higher accuracy with significantly lower computational overhead.

03

Demonstrates robustness and efficiency in multimodal sentiment analysis.

Abstract

Multimodal Sentiment Analysis (MSA) leverages multiple data modals to analyze human sentiment. Existing MSA models generally employ cutting-edge multimodal fusion and representation learning-based methods to promote MSA capability. However, there are two key challenges: (i) in existing multimodal fusion methods, the decoupling of modal combinations and tremendous parameter redundancy, lead to insufficient fusion performance and efficiency; (ii) a challenging trade-off exists between representation capability and computational overhead in unimodal feature extractors and encoders. Our proposed GSIFN incorporates two main components to solve these problems: (i) a graph-structured and interlaced-masked multimodal Transformer. It adopts the Interlaced Mask mechanism to construct robust multimodal graph embedding, achieve all-modal-in-one Transformer-based fusion, and greatly reduce the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drewjin/GSIFN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques · Emotion and Mood Recognition

MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Tanh Activation · Residual Connection · Multi-Head Attention · Byte Pair Encoding