GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online   Action Prediction

Samrudhdhi B Rangrej; Kevin J Liang; Tal Hassner; James J Clark

arXiv:2210.13605·cs.CV·April 20, 2023

GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction

Samrudhdhi B Rangrej, Kevin J Liang, Tal Hassner, James J Clark

PDF

Open Access 1 Repo 2 Videos

TL;DR

GliTr introduces a novel glimpse-based transformer model for online action prediction that operates with limited visual information, leveraging spatiotemporal consistency to achieve high accuracy with minimal frame observation.

Contribution

This work proposes GliTr, a glimpse transformer with a spatiotemporal consistency training objective, enabling effective action prediction using only partial frame glimpses.

Findings

01

Achieves 53.02% accuracy on SSv2 with only 33% of frame area observed.

02

Outperforms baseline methods by approximately 10% with the proposed consistency objective.

03

Demonstrates high accuracy (93.91%) on Jester dataset with limited visual input.

Abstract

Many online action prediction models observe complete frames to locate and attend to informative subregions in the frames called glimpses and recognize an ongoing action based on global and local information. However, in applications with constrained resources, an agent may not be able to observe the complete frame, yet must still locate useful glimpses to predict an incomplete action based on local information only. In this paper, we develop Glimpse Transformers (GliTr), which observe only narrow glimpses at all times, thus predicting an ongoing action and the following most informative glimpse location based on the partial spatiotemporal information collected so far. In the absence of a ground truth for the optimal glimpse locations for action recognition, we train GliTr using a novel spatiotemporal consistency objective: We require GliTr to attend to the glimpses with features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/glitr
pytorchOfficial

Videos

GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications