Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
Xunsong Li, Pengzhan Sun, Yangcen Liu, Lixin Duan, Wen Li

TL;DR
This paper introduces an end-to-end object-centric action recognition framework that simultaneously detects objects and reasons about their interactions, improving performance over traditional multi-stage methods.
Contribution
The proposed method jointly performs object detection and interaction reasoning in a single stage, reducing reliance on external detectors and multi-stage training, and enhancing action recognition accuracy.
Findings
Outperforms state-of-the-art on Something-Else and Ikea-Assembly datasets.
Effectively captures interactive objects crucial for action recognition.
Demonstrates robustness in few-shot and compositional action tasks.
Abstract
The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action recognition model for extracting video features and learning the object relations for action recognition. However, since the action prior is unknown in the object detection stage, important objects could be easily overlooked, leading to inferior action recognition performance. In this paper, we propose an end-to-end object-centric action recognition framework that simultaneously performs Detection And Interaction Reasoning in one stage. Particularly, after extracting video features with a base network, we create three modules for concurrent object detection and interaction reasoning. First, a Patch-based Object Decoder generates proposals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
MethodsINFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors · Balanced Selection
