HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Yongming Rao, Wenliang Zhao, Yansong Tang, Jie Zhou, Ser-Nam Lim,, Jiwen Lu

TL;DR
HorNet introduces Recursive Gated Convolutions, a flexible high-order spatial interaction module that enhances vision models by combining the strengths of Transformers and CNNs, achieving superior performance across multiple vision tasks.
Contribution
The paper proposes $ extit{g}^ extit{n}$Conv, a novel recursive gated convolution operation that efficiently models high-order spatial interactions, and constructs HorNet, a new family of vision backbones utilizing this operation.
Findings
HorNet outperforms Swin Transformers and ConvNeXt on ImageNet, COCO, and ADE20K.
$ extit{g}^ extit{n}$Conv improves dense prediction tasks with less computation.
HorNet scales well with more data and larger models.
Abstract
Recent progress in vision Transformers exhibits great success in various tasks driven by the new spatial modeling mechanism based on dot-product self-attention. In this paper, we show that the key ingredients behind the vision Transformers, namely input-adaptive, long-range and high-order spatial interactions, can also be efficiently implemented with a convolution-based framework. We present the Recursive Gated Convolution (Conv) that performs high-order spatial interactions with gated convolutions and recursive designs. The new operation is highly flexible and customizable, which is compatible with various variants of convolution and extends the two-order interactions in self-attention to arbitrary orders without introducing significant extra computation. Conv can serve as a plug-and-play module to improve various vision Transformers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsConvNeXt · 1x1 Convolution · Gated Linear Unit · Convolution · Gated Convolution
