Object Level Visual Reasoning in Videos
Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, Greg, Mori

TL;DR
This paper introduces a novel object-level reasoning model for video activity recognition that captures detailed semantic interactions between objects and actors, achieving state-of-the-art results on multiple datasets.
Contribution
The paper presents a new model that performs semantic spatiotemporal reasoning at the object level using advanced detection networks, enhancing activity understanding.
Findings
Achieves state-of-the-art performance on three standard datasets.
Effectively models detailed object interactions in videos.
Provides visualizations of learned semantic interactions.
Abstract
Human activity recognition is typically addressed by detecting key concepts like global and local motion, features related to object classes present in the scene, as well as features related to the global context. The next open challenges in activity recognition require a level of understanding that pushes beyond this and call for models with capabilities for fine distinction and detailed comprehension of interactions between actors and objects in a scene. We propose a model capable of learning to reason about semantically meaningful spatiotemporal interactions in videos. The key to our approach is a choice of performing this reasoning at the object level through the integration of state of the art object detection networks. This allows the model to learn detailed spatial interactions that exist at a semantic, object-interaction relevant level. We evaluate our method on three standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
