Referring Atomic Video Action Recognition
Kunyu Peng, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu,, Junwei Zheng, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina, Roitberg

TL;DR
This paper introduces RAVAR, a new task for recognizing atomic actions of a specific person in video guided by text, along with a new dataset and a specialized model, RefAtomNet, to address this challenge.
Contribution
The paper presents the RAVAR task, a new dataset, and a novel cross-stream attention model, RefAtomNet, tailored for referring atomic video action recognition.
Findings
RefAtomNet outperforms baseline methods on RAVAR.
The dataset contains 36,630 annotated instances.
Cross-stream attention improves localization and action recognition.
Abstract
We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic action of a specific individual, guided by text. To explore this task, we present the RefAVA dataset, containing 36,630 instances with manually annotated textual descriptions of the individuals. To establish a strong initial benchmark, we implement and validate baselines from various domains, e.g., atomic action localization, video question answering, and text-video retrieval. Since these existing methods underperform on RAVAR, we introduce RefAtomNet -- a novel cross-stream attention-driven method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Memory and Neural Computing · Anomaly Detection Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Focus
