TransModality: An End2End Fusion Method with Transformer for Multimodal   Sentiment Analysis

Zilong Wang; Zhaohong Wan; and Xiaojun Wan

arXiv:2009.02902·cs.CL·September 29, 2020·1 cites

TransModality: An End2End Fusion Method with Transformer for Multimodal Sentiment Analysis

Zilong Wang, Zhaohong Wan, and Xiaojun Wan

PDF

Open Access

TL;DR

TransModality introduces an end-to-end Transformer-based fusion approach for multimodal sentiment analysis, effectively capturing subtle cross-modal correlations and achieving state-of-the-art results on multiple datasets.

Contribution

The paper proposes a novel end-to-end Transformer-based fusion method, TransModality, for multimodal sentiment analysis, leveraging translation between modalities to improve joint representations.

Findings

01

Achieves state-of-the-art performance on CMU-MOSI, MELD, IEMOCAP datasets.

02

Demonstrates effectiveness of translation-based fusion in multimodal sentiment analysis.

03

Validates the model's superiority over existing fusion methods.

Abstract

Multimodal sentiment analysis is an important research area that predicts speaker's sentiment tendency through features extracted from textual, visual and acoustic modalities. The central challenge is the fusion method of the multimodal information. A variety of fusion methods have been proposed, but few of them adopt end-to-end translation models to mine the subtle correlation between modalities. Enlightened by recent success of Transformer in the area of machine translation, we propose a new fusion method, TransModality, to address the task of multimodal sentiment analysis. We assume that translation between modalities contributes to a better joint representation of speaker's utterance. With Transformer, the learned features embody the information both from the source modality and the target modality. We validate our model on multiple multimodal datasets: CMU-MOSI, MELD, IEMOCAP. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Music and Audio Processing · Emotion and Mood Recognition

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dropout · Dense Connections · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Multi-Head Attention