Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes
Samrudhdhi B. Rangrej, Chetan L. Srinidhi, James J. Clark

TL;DR
This paper introduces a sequential transformer-based attention model that predicts informative glimpses in partially observable scenes, improving classification accuracy and efficiency by enforcing consistency between partial and full-image predictions.
Contribution
The proposed model is the first to use consistency-driven training for partially observable scenes with sequential transformers, reducing pixel observations while maintaining high accuracy.
Findings
Achieves 3% and 8% higher accuracy on ImageNet and fMoW with only 4% of the image observed.
Outperforms state-of-the-art by observing 27% and 42% fewer pixels on ImageNet and fMoW.
Uses a novel consistency loss to align partial and full-image class distributions.
Abstract
Most hard attention models initially observe a complete scene to locate and sense informative glimpses, and predict class-label of a scene based on glimpses. However, in many applications (e.g., aerial imaging), observing an entire scene is not always feasible due to the limited time and resources available for acquisition. In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. We design our agent using DeiT-distilled and train it with a one-step actor-critic algorithm. Furthermore, to improve classification performance, we introduce a novel training objective, which enforces consistency between the class distribution predicted by a teacher model from a complete image and the class distribution predicted by our agent using glimpses. When the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
