Container: Context Aggregation Network

Peng Gao; Jiasen Lu; Hongsheng Li; Roozbeh Mottaghi; Aniruddha; Kembhavi

arXiv:2106.01401·cs.CV·October 19, 2021·41 cites

Container: Context Aggregation Network

Peng Gao, Jiasen Lu, Hongsheng Li, Roozbeh Mottaghi, Aniruddha, Kembhavi

PDF

Open Access 4 Repos

TL;DR

The paper introduces Container, a unified context aggregation network that combines the strengths of CNNs and Transformers, achieving faster convergence and superior performance in vision tasks like detection and segmentation.

Contribution

It proposes a novel general-purpose block for multi-head context aggregation, unifying CNNs, Transformers, and MLP-Mixers, with an efficient variant for downstream vision tasks.

Findings

01

Achieves state-of-the-art detection and segmentation results with improved mAP scores.

02

Faster convergence speeds compared to traditional CNNs.

03

Effective in self-supervised learning frameworks.

Abstract

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continue to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems

MethodsMulti-Head Attention · Attention Is All You Need · Vision Transformer · Linear Layer · Feature Pyramid Network · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Dropout · Feedforward Network · 1x1 Convolution