More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using   Sparsity

Shiwei Liu; Tianlong Chen; Xiaohan Chen; Xuxi Chen; Qiao Xiao; Boqian; Wu; Tommi K\"arkk\"ainen; Mykola Pechenizkiy; Decebal Mocanu; Zhangyang Wang

arXiv:2207.03620·cs.CV·March 7, 2023·88 cites

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian, Wu, Tommi K\"arkk\"ainen, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that by applying sparsity techniques, convolutional neural networks can effectively scale to extremely large kernels beyond 51x51, achieving performance comparable to or better than state-of-the-art transformers across multiple vision tasks.

Contribution

The paper introduces a sparsity-based recipe for training extremely large kernels in CNNs, enabling kernels up to 61x61 and proposing the Sparse Large Kernel Network (SLaK) architecture.

Findings

01

SLaK with sparse 51x51 kernels matches or exceeds transformer performance.

02

Scaling kernels beyond 51x51 improves accuracy across vision tasks.

03

Sparse large kernels enable CNNs to compete with or outperform modern transformers.

Abstract

Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) seems to be challenged by increasingly effective transformer-based models. Very recently, a couple of advanced convolutional models strike back with large kernels motivated by the local-window attention mechanism, showing appealing performance and efficiency. While one of them, i.e. RepLKNet, impressively manages to scale the kernel size to 31x31 with improved performance, the performance starts to saturate as the kernel size continues growing, compared to the scaling trend of advanced ViTs such as Swin Transformer. In this paper, we explore the possibility of training extreme convolutions larger than 31x31 and test whether the performance gap can be eliminated by strategically enlarging convolutions. This study ends…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/slak
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · ConvNeXt · Test · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Label Smoothing · Position-Wise Feed-Forward Layer