Are Large Kernels Better Teachers than Transformers for ConvNets?

Tianjin Huang; Lu Yin; Zhenyu Zhang; Li Shen; Meng Fang; Mykola; Pechenizkiy; Zhangyang Wang; Shiwei Liu

arXiv:2305.19412·cs.CV·June 1, 2023·1 cites

Are Large Kernels Better Teachers than Transformers for ConvNets?

Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola, Pechenizkiy, Zhangyang Wang, Shiwei Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This study demonstrates that large-kernel ConvNets serve as highly effective teachers for small-kernel ConvNets in knowledge distillation, outperforming Transformers and achieving state-of-the-art results on ImageNet.

Contribution

It is the first to show large-kernel ConvNets are superior teachers for small-kernel ConvNets in knowledge distillation, leading to improved performance and transfer of beneficial characteristics.

Findings

01

Large-kernel ConvNets outperform Transformers as teachers in KD.

02

Achieved the best-ever pure ConvNet with 83.1% top-1 accuracy on ImageNet.

03

Beneficial properties like larger receptive fields are transferred through KD.

Abstract

This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets. While Transformers have led state-of-the-art (SOTA) performance in various fields with ever-larger models and labeled data, small-kernel ConvNets are considered more suitable for resource-limited applications due to the efficient convolution operation and compact weight sharing. KD is widely used to boost the performance of small-kernel ConvNets. However, previous research shows that it is not quite effective to distill knowledge (e.g., global information) from Transformers to small-kernel ConvNets, presumably due to their disparate architectures. We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/slak
pytorchOfficial

Videos

Are Large Kernels Better Teachers than Transformers for ConvNets?· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsConvNeXt · Knowledge Distillation · Convolution