Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment   Analysis in Videos

Lianyang Ma; Yu Yao; Tao Liang; Tongliang Liu

arXiv:2206.07981·cs.CV·June 20, 2022·6 cites

Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment Analysis in Videos

Lianyang Ma, Yu Yao, Tao Liang, Tongliang Liu

PDF

Open Access

TL;DR

This paper introduces a multi-scale cooperative transformer architecture for multimodal sentiment analysis in videos, leveraging multi-level semantic features for improved crossmodal interaction and robustness.

Contribution

It proposes a novel multi-scale cooperative transformer that exploits multi-level semantic features for better multimodal fusion in sentiment analysis.

Findings

01

Outperforms existing methods on unaligned multimodal sequences

02

Achieves strong performance on aligned multimodal sequences

03

Enhances robustness of multimodal sentiment analysis

Abstract

Multimodal sentiment analysis in videos is a key task in many real-world applications, which usually requires integrating multimodal streams including visual, verbal and acoustic behaviors. To improve the robustness of multimodal fusion, some of the existing methods let different modalities communicate with each other and modal the crossmodal interaction via transformers. However, these methods only use the single-scale representations during the interaction but forget to exploit multi-scale representations that contain different levels of semantic information. As a result, the representations learned by transformers could be biased especially for unaligned multimodal data. In this paper, we propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis. On the whole, the "multi-scale" mechanism is capable of exploiting the different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Advanced Computing and Algorithms