Near-Optimal Glimpse Sequences for Improved Hard Attention Neural Network Training
William Harvey, Michael Teng, Frank Wood

TL;DR
This paper introduces a Bayesian optimal experimental design approach to generate near-optimal glimpse sequences for hard attention in neural networks, improving training efficiency and reducing variance.
Contribution
It frames hard attention as a BOED problem, proposing a method to generate reusable near-optimal attention sequences to enhance training of neural networks.
Findings
Generated near-optimal attention sequences improve training speed.
Sequences can be reused across different networks for the same task.
Method reduces training variance in hard attention models.
Abstract
Hard visual attention is a promising approach to reduce the computational burden of modern computer vision methodologies. Hard attention mechanisms are typically non-differentiable. They can be trained with reinforcement learning but the high-variance training this entails hinders more widespread application. We show how hard attention for image classification can be framed as a Bayesian optimal experimental design (BOED) problem. From this perspective, the optimal locations to attend to are those which provide the greatest expected reduction in the entropy of the classification distribution. We introduce methodology from the BOED literature to approximate this optimal behaviour, and use it to generate `near-optimal' sequences of attention locations. We then show how to use such sequences to partially supervise, and therefore speed up, the training of a hard attention mechanism.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Image Processing Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
