Sec2Sec Co-attention for Video-Based Apparent Affective Prediction

Mingwei Sun; Kunpeng Zhang

arXiv:2408.15209·cs.MM·August 28, 2024

Sec2Sec Co-attention for Video-Based Apparent Affective Prediction

Mingwei Sun, Kunpeng Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel LSTM-Transformer co-attention model for video-based affect prediction, improving accuracy and interpretability by integrating vision, audio, and spatiotemporal cues.

Contribution

It presents a new Sec2Sec Co-attention Transformer that outperforms existing methods and provides interpretability in affect prediction tasks.

Findings

01

Outperforms state-of-the-art on LIRIS-ACCEDE and First Impressions datasets

02

Provides interpretability of affective contributions over time

03

Effective integration of multi-modal video elements

Abstract

Video-based apparent affect detection plays a crucial role in video understanding, as it encompasses various elements such as vision, audio, audio-visual interactions, and spatiotemporal information, which are essential for accurate video predictions. However, existing approaches often focus on extracting only a subset of these elements, resulting in the limited predictive capacity of their models. To address this limitation, we propose a novel LSTM-based network augmented with a Transformer co-attention mechanism for predicting apparent affect in videos. We demonstrate that our proposed Sec2Sec Co-attention Transformer surpasses multiple state-of-the-art methods in predicting apparent affect on two widely used datasets: LIRIS-ACCEDE and First Impressions. Notably, our model offers interpretability, allowing us to examine the contributions of different time points to the overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nestor-sun/sec2sec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition