InternImage: Exploring Large-Scale Vision Foundation Models with   Deformable Convolutions

Wenhai Wang; Jifeng Dai; Zhe Chen; Zhenhang Huang; Zhiqi Li; Xizhou; Zhu; Xiaowei Hu; Tong Lu; Lewei Lu; Hongsheng Li; Xiaogang Wang; Yu Qiao

arXiv:2211.05778·cs.CV·April 18, 2023·39 cites

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou, Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

PDF

Open Access 3 Repos 1 Models

TL;DR

InternImage introduces a large-scale CNN model utilizing deformable convolutions, achieving state-of-the-art results on benchmarks like COCO and ADE20K, and bridging the performance gap with vision transformers.

Contribution

The paper presents a novel large-scale CNN model, InternImage, that leverages deformable convolutions to enhance receptive fields and adaptively learn robust patterns from massive data.

Findings

01

Achieved 65.4 mAP on COCO test-dev

02

Reached 62.9 mIoU on ADE20K

03

Outperformed existing CNNs and ViTs on key benchmarks

Abstract

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
OpenGVLab/DCNv4
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors

MethodsConvolution · Deformable Convolution