GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B Rangrej, Kevin J Liang, Tal Hassner, James J Clark

TL;DR
GliTr introduces a novel glimpse-based transformer model for online action prediction that operates with limited visual information, leveraging spatiotemporal consistency to achieve high accuracy with minimal frame observation.
Contribution
This work proposes GliTr, a glimpse transformer with a spatiotemporal consistency training objective, enabling effective action prediction using only partial frame glimpses.
Findings
Achieves 53.02% accuracy on SSv2 with only 33% of frame area observed.
Outperforms baseline methods by approximately 10% with the proposed consistency objective.
Demonstrates high accuracy (93.91%) on Jester dataset with limited visual input.
Abstract
Many online action prediction models observe complete frames to locate and attend to informative subregions in the frames called glimpses and recognize an ongoing action based on global and local information. However, in applications with constrained resources, an agent may not be able to observe the complete frame, yet must still locate useful glimpses to predict an incomplete action based on local information only. In this paper, we develop Glimpse Transformers (GliTr), which observe only narrow glimpses at all times, thus predicting an ongoing action and the following most informative glimpse location based on the partial spatiotemporal information collected so far. In the absence of a ground truth for the optimal glimpse locations for action recognition, we train GliTr using a novel spatiotemporal consistency objective: We require GliTr to attend to the glimpses with features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction· youtube
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
