Tensor Fusion Network for Multimodal Sentiment Analysis

Amir Zadeh; Minghai Chen; Soujanya Poria; Erik Cambria; Louis-Philippe; Morency

arXiv:1707.07250·cs.CL·July 25, 2017

Tensor Fusion Network for Multimodal Sentiment Analysis

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, Louis-Philippe, Morency

PDF

2 Repos

TL;DR

This paper introduces the Tensor Fusion Network, a novel deep learning model designed to effectively capture intra- and inter-modality dynamics for multimodal sentiment analysis, outperforming existing methods.

Contribution

The paper presents a new Tensor Fusion Network that models modality interactions end-to-end, specifically tailored for analyzing sentiment from spoken language, gestures, and voice.

Findings

01

Outperforms state-of-the-art multimodal sentiment analysis models

02

Effective modeling of intra- and inter-modality dynamics

03

Applicable to online videos with spoken language and gestures

Abstract

Multimodal sentiment analysis is an increasingly popular research area, which extends the conventional language-based definition of sentiment analysis to a multimodal setup where other relevant modalities accompany language. In this paper, we pose the problem of multimodal sentiment analysis as modeling intra-modality and inter-modality dynamics. We introduce a novel model, termed Tensor Fusion Network, which learns both such dynamics end-to-end. The proposed approach is tailored for the volatile nature of spoken language in online videos as well as accompanying gestures and voice. In the experiments, our model outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.