FoveaTer: Foveated Transformer for Image Classification

Aditya Jonnalagadda; William Yang Wang; B. S. Manjunath; Miguel P.; Eckstein

arXiv:2105.14173·cs.CV·October 4, 2022·1 cites

FoveaTer: Foveated Transformer for Image Classification

Aditya Jonnalagadda, William Yang Wang, B. S. Manjunath, Miguel P., Eckstein

PDF

Open Access

TL;DR

FoveaTer introduces a foveated vision transformer that mimics biological eye movements and peripheral vision, improving scene classification efficiency and robustness against adversarial attacks.

Contribution

This paper presents the first foveated transformer architecture that incorporates eye movement-inspired pooling and attention mechanisms for image classification.

Findings

01

FoveaTer outperforms baseline models in scene categorization tasks.

02

The model better explains human decision-making in visual tasks.

03

FoveaTer shows increased robustness to adversarial attacks.

Abstract

Many animals and humans process the visual field with a varying spatial resolution (foveated vision) and use peripheral processing to make eye movements and point the fovea to acquire high-resolution information about objects of interest. This architecture results in computationally efficient rapid scene exploration. Recent progress in self-attention-based Vision Transformers, an alternative to the traditionally convolution-reliant computer vision systems. However, the Transformer models do not explicitly model the foveated properties of the visual system nor the interaction between eye movements and the classification task. We propose Foveated Transformer (FoveaTer) model, which uses pooling regions and eye movements to perform object classification tasks using a Vision Transformer architecture. Using square pooling regions or biologically-inspired radial-polar pooling regions, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Vision Transformer · Label Smoothing · Layer Normalization · Byte Pair Encoding · Residual Connection · Dropout