AffectSeek: Agentic Affective Understanding in Long Videos under Vague User Queries

Zhen Zhang; Yuhang Yang; Yunxiang Jiang; Yuhuan Lu; Haifeng Lu; Zheng Lian; Runhao Zeng; and Xiping Hu

arXiv:2605.05640·cs.CV·May 8, 2026

AffectSeek: Agentic Affective Understanding in Long Videos under Vague User Queries

Zhen Zhang, Yuhang Yang, Yunxiang Jiang, Yuhuan Lu, Haifeng Lu, Zheng Lian, Runhao Zeng, and Xiping Hu

PDF

TL;DR

This paper introduces VQAU, a new task and benchmark for affective understanding in long videos driven by vague user queries, and proposes AffectSeek, an agentic framework for multi-step reasoning and evidence grounding.

Contribution

It defines the VQAU task and constructs VQAU-Bench, a comprehensive benchmark, and proposes AffectSeek, an agentic model for effective affective understanding in long videos.

Findings

01

Existing models struggle with VQAU challenges.

02

AffectSeek outperforms baseline models on VQAU-Bench.

03

VQAU remains a difficult task for current affective recognition methods.

Abstract

Existing affective understanding studies have mainly focused on recognizing emotions from images, audio signals, or pre-cliped video clips, where the affective evidence is already given. This passive and clip-centered setting does not fully reflect real-world scenarios, in which users often interact with long videos and express their needs through natural-language queries. In this paper, we study \textbf{Vague-Query-driven video Affective Understanding (VQAU)}, a new task that requires models to localize affective moments in long videos, predict their emotion categories, and generate evidence-grounded rationales under vague user queries. To support this task, we construct \textbf{VQAU-Bench}, a benchmark that integrates long videos, vague affective queries, temporal clip annotations, emotion labels, and rationale explanations into a unified evaluation framework. VQAU-Bench enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.