Consistency driven Sequential Transformers Attention Model for Partially   Observable Scenes

Samrudhdhi B. Rangrej; Chetan L. Srinidhi; James J. Clark

arXiv:2204.00656·cs.CV·April 5, 2022

Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes

Samrudhdhi B. Rangrej, Chetan L. Srinidhi, James J. Clark

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sequential transformer-based attention model that predicts informative glimpses in partially observable scenes, improving classification accuracy and efficiency by enforcing consistency between partial and full-image predictions.

Contribution

The proposed model is the first to use consistency-driven training for partially observable scenes with sequential transformers, reducing pixel observations while maintaining high accuracy.

Findings

01

Achieves 3% and 8% higher accuracy on ImageNet and fMoW with only 4% of the image observed.

02

Outperforms state-of-the-art by observing 27% and 42% fewer pixels on ImageNet and fMoW.

03

Uses a novel consistency loss to align partial and full-image class distributions.

Abstract

Most hard attention models initially observe a complete scene to locate and sense informative glimpses, and predict class-label of a scene based on glimpses. However, in many applications (e.g., aerial imaging), observing an entire scene is not always feasible due to the limited time and resources available for acquisition. In this paper, we develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image and predicts informative glimpse locations solely based on past glimpses. We design our agent using DeiT-distilled and train it with a one-step actor-critic algorithm. Furthermore, to improve classification performance, we introduce a novel training objective, which enforces consistency between the class distribution predicted by a teacher model from a complete image and the class distribution predicted by our agent using glimpses. When the agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samrudhdhirangrej/STAM-Sequential-Transformers-Attention-Model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications