X-volution: On the unification of convolution and self-attention
Xuanhong Chen, Hang Wang, Bingbing Ni

TL;DR
X-volution unifies convolution and self-attention into a single, re-parameterizable module that enhances neural network performance in visual tasks by capturing both local and global features.
Contribution
The paper introduces a theoretically derived approximation scheme and a multi-branch module that combines convolution and self-attention, enabling a unified and re-parameterizable operator for neural networks.
Findings
Achieves +1.2% top-1 accuracy on ImageNet classification.
Improves COCO detection and segmentation metrics (+1.7 box AP, +1.5 mask AP).
Demonstrates competitive performance with a unified convolution/self-attention module.
Abstract
Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships. Though essentially complementary to each other, i.e., first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers lack a principled way to simultaneously apply both operations in a single computational module, due to their heterogeneous computing pattern and excessive burden of global dot-product for visual tasks. In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features. Based on the approximated scheme, we establish a multi-branch elementary module composed of both convolution and self-attention operation, capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Advanced Image and Video Retrieval Techniques · Topological and Geometric Data Analysis
MethodsConvolution
