Dilated Context Integrated Network with Cross-Modal Consensus for   Temporal Emotion Localization in Videos

Juncheng Li; Junlin Xie; Linchao Zhu; Long Qian; Siliang Tang; Wenqiao; Zhang; Haochen Shi; Shengyu Zhang; Longhui Wei; Qi Tian; Yueting Zhuang

arXiv:2208.01954·cs.CV·August 4, 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao, Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for localizing human emotions in untrimmed videos using a dilated context integrated network and cross-modal consensus learning, addressing challenges of varied temporal dynamics and complex cues.

Contribution

The paper proposes a new TEL task, a dual-stream network architecture, and a weakly-supervised learning paradigm leveraging video and subtitle consensus, along with a new annotated dataset.

Findings

01

Effective in localizing emotions with varied temporal dynamics

02

Achieves accurate emotion boundary detection in untrimmed videos

03

Demonstrates superiority over baseline methods

Abstract

Understanding human emotions is a crucial ability for intelligent robots to provide better human-robot interactions. The existing works are limited to trimmed video-level emotion classification, failing to locate the temporal window corresponding to the emotion. In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles. TEL presents three unique challenges compared to temporal action localization: 1) The emotions have extremely varied temporal dynamics; 2) The emotion cues are embedded in both appearances and complex plots; 3) The fine-grained temporal annotations are complicated and labor-intensive. To address the first two challenges, we propose a novel dilated context integrated network with a coarse-fine two-stream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyjmjc/temporal-emotion-localization-in-videos
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Multimodal Machine Learning Applications