TL;DR
This paper introduces a novel RGB-based human activity recognition method that uses a visual attention module to select interest points without pose information, improving accuracy on large datasets.
Contribution
The proposed approach uniquely employs unstructured glimpse sequences and a set of recurrent workers for motion tracking and activity recognition, outperforming existing methods.
Findings
Outperforms state-of-the-art on NTU RGB+D dataset
Effective use of unstructured interest points for activity recognition
Demonstrates robustness without pose information
Abstract
We propose a method for human activity recognition from RGB data that does not rely on any pose information during test time and does not explicitly calculate pose information internally. Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities. No spatial coherence is forced on the glimpse locations, which gives the module liberty to explore different points at each frame and better optimize the process of scrutinizing visual information. Tracking and sequentially integrating this kind of unstructured data is a challenge, which we address by separating the set of glimpses from a set of recurrent tracking/recognition workers. These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction. The glimpses are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
