AttZoom: Attention Zoom for Better Visual Features

Daniel DeAlcala; Aythami Morales; Julian Fierrez; Ruben Tolosana

arXiv:2508.03625·cs.CV·August 6, 2025

AttZoom: Attention Zoom for Better Visual Features

Daniel DeAlcala, Aythami Morales, Julian Fierrez, Ruben Tolosana

PDF

TL;DR

Attention Zoom is a versatile spatial attention layer that enhances feature extraction in CNNs, leading to improved classification accuracy and more detailed attention patterns across various models and datasets.

Contribution

We introduce a modular, architecture-agnostic spatial attention layer called Attention Zoom that improves CNN feature extraction without significant overhead.

Findings

01

Consistent accuracy improvements on CIFAR-100 and TinyImageNet

02

Encourages fine-grained, diverse attention patterns

03

Effective across multiple CNN architectures

Abstract

We present Attention Zoom, a modular and model-agnostic spatial attention mechanism designed to improve feature extraction in convolutional neural networks (CNNs). Unlike traditional attention approaches that require architecture-specific integration, our method introduces a standalone layer that spatially emphasizes high-importance regions in the input. We evaluated Attention Zoom on multiple CNN backbones using CIFAR-100 and TinyImageNet, showing consistent improvements in Top-1 and Top-5 classification accuracy. Visual analyses using Grad-CAM and spatial warping reveal that our method encourages fine-grained and diverse attention patterns. Our results confirm the effectiveness and generality of the proposed layer for improving CCNs with minimal architectural overhead.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.