TL;DR
This paper proposes a lightweight architectural modification using Global Average Pooling to enhance translation invariance in CNNs, drastically reducing parameters while improving robustness and generalization.
Contribution
It introduces a simple, parameter-efficient method to improve translation invariance in CNNs, outperforming traditional data augmentation in robustness and generalization.
Findings
Achieves 98% reduction in trainable parameters with maintained accuracy.
Doubles translational robustness, reducing average relative loss from 0.09 to 0.05.
Outperforms baseline in perceptual image quality assessment tasks.
Abstract
Convolutional Neural Networks (CNNs) are widely assumed to be translation-invariant, yet standard architectures exhibit a startling fragility: even a single-pixel shift can drastically degrade performance due to their reliance on spatially dependent fully connected layers. In this work, we resolve this vulnerability by proposing a lightweight 'Online Architecture' strategy. By strategically inserting Global Average Pooling (GAP) layers at various network depths, we effectively decouple feature recognition from spatial location. Using VGG-16 as a primary case study, we demonstrate that this architectural modification achieves a massive 98% reduction in trainable parameters (from 5.2M to just 82K) and a 90% reduction in total network size (138M to 14M). Despite this drastic pruning, our variants maintain competitive Top-1 accuracy on ImageNet (66.4%) while doubling translational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
