Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long   Temporal Context

Jie Zhang; Yin Zhao; Longjun Cai; Chaoping Tu; Wu Wei

arXiv:1909.01763·cs.CV·September 5, 2019·1 cites

Video Affective Effects Prediction with Multi-modal Fusion and Shot-Long Temporal Context

Jie Zhang, Yin Zhao, Longjun Cai, Chaoping Tu, Wu Wei

PDF

Open Access

TL;DR

This paper introduces a novel multi-modal fusion framework with shot-long temporal context modeling for predicting emotional impact in videos, significantly improving accuracy over existing methods.

Contribution

The paper proposes a comprehensive framework with modality-specific feature extraction, two-scale temporal structures, and a residual-based progressive fusion strategy for emotion prediction.

Findings

01

Achieved superior performance on the LIRIS-ACCEDE dataset.

02

Effectively models intra- and inter-clip temporal dependencies.

03

Enhances multi-modal fusion with residual-based training.

Abstract

Predicting the emotional impact of videos using machine learning is a challenging task considering the varieties of modalities, the complicated temporal contex of the video as well as the time dependency of the emotional states. Feature extraction, multi-modal fusion and temporal context fusion are crucial stages for predicting valence and arousal values in the emotional impact, but have not been successfully exploited. In this paper, we propose a comprehensive framework with novel designs of modal structure and multi-modal fusion strategy. We select the most suitable modalities for valence and arousal tasks respectively and each modal feature is extracted using the modality-specific pre-trained deep model on large generic dataset. Two-time-scale structures, one for the intra-clip and the other for the inter-clip, are proposed to capture the temporal dependency of video content and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Emotion and Mood Recognition · Video Analysis and Summarization