Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Piotr Bojanowski,, Armand Joulin, Gabriel Synnaeve, Herv\'e J\'egou

TL;DR
This paper introduces an attention-based aggregation layer to augment convolutional networks, enabling non-local reasoning and improving performance across multiple vision tasks with efficient memory use.
Contribution
It proposes a novel attention-based global map integrated into convolutional networks, replacing average pooling, and maintains input resolution for better trade-offs.
Findings
Achieves competitive accuracy and complexity trade-offs.
Reduces memory consumption compared to pyramidal designs.
Effective across object classification, segmentation, and detection tasks.
Abstract
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning. We replace the final average pooling by an attention-based aggregation layer akin to a single transformer block, that weights how the patches are involved in the classification decision. We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth). In contrast with a pyramidal design, this architecture family maintains the input patch resolution across all the layers. It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption, as shown by our experiments on various computer vision tasks: object classification, image segmentation and detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsLayerScale · Class Attention · Average Pooling
