AffectSeek: Agentic Affective Understanding in Long Videos under Vague User Queries
Zhen Zhang, Yuhang Yang, Yunxiang Jiang, Yuhuan Lu, Haifeng Lu, Zheng Lian, Runhao Zeng, and Xiping Hu

TL;DR
This paper introduces VQAU, a new task and benchmark for affective understanding in long videos driven by vague user queries, and proposes AffectSeek, an agentic framework for multi-step reasoning and evidence grounding.
Contribution
It defines the VQAU task and constructs VQAU-Bench, a comprehensive benchmark, and proposes AffectSeek, an agentic model for effective affective understanding in long videos.
Findings
Existing models struggle with VQAU challenges.
AffectSeek outperforms baseline models on VQAU-Bench.
VQAU remains a difficult task for current affective recognition methods.
Abstract
Existing affective understanding studies have mainly focused on recognizing emotions from images, audio signals, or pre-cliped video clips, where the affective evidence is already given. This passive and clip-centered setting does not fully reflect real-world scenarios, in which users often interact with long videos and express their needs through natural-language queries. In this paper, we study \textbf{Vague-Query-driven video Affective Understanding (VQAU)}, a new task that requires models to localize affective moments in long videos, predict their emotion categories, and generate evidence-grounded rationales under vague user queries. To support this task, we construct \textbf{VQAU-Bench}, a benchmark that integrates long videos, vague affective queries, temporal clip annotations, emotion labels, and rationale explanations into a unified evaluation framework. VQAU-Bench enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
