Video Sentiment Analysis with Bimodal Information-augmented Multi-Head   Attention

Ting Wu; Junjie Peng; Wenqiang Zhang; Huiran Zhang; Chuanshuai Ma,; Yansong Huang

arXiv:2103.02362·cs.AI·November 17, 2021

Video Sentiment Analysis with Bimodal Information-augmented Multi-Head Attention

Ting Wu, Junjie Peng, Wenqiang Zhang, Huiran Zhang, Chuanshuai Ma,, Yansong Huang

PDF

TL;DR

This paper introduces a multi-head attention based fusion network for multimodal sentiment analysis, effectively combining textual, visual, and acoustic signals to improve prediction accuracy and interpretability.

Contribution

It proposes a novel multi-head attention fusion network that models pairwise modality interactions with residual connections, enhancing sentiment analysis performance.

Findings

01

Outperforms existing methods on four public datasets

02

Effectively models pairwise modality interactions

03

Provides interpretability of bimodal contributions

Abstract

Humans express feelings or emotions via different channels. Take language as an example, it entails different sentiments under different visual-acoustic contexts. To precisely understand human intentions as well as reduce the misunderstandings caused by ambiguity and sarcasm, we should consider multimodal signals including textual, visual and acoustic signals. The crucial challenge is to fuse different modalities of features for sentiment analysis. To effectively fuse the information carried by different modalities and better predict the sentiments, we design a novel multi-head attention based fusion network, which is inspired by the observations that the interactions between any two pair-wise modalities are different and they do not equally contribute to the final sentiment prediction. By assigning the acoustic-visual, acoustic-textual and visual-textual features with reasonable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention