ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang

TL;DR
ParCNetV2 introduces oversized kernels and bifurcate gate units to enhance attention in CNNs, achieving long-range dependency modeling and implicit positional encoding, outperforming existing CNNs and hybrid models in vision tasks.
Contribution
It proposes a novel CNN architecture with oversized convolutions and bifurcate gate units, integrating global receptive fields and attention mechanisms inspired by transformers.
Findings
Outperforms other CNNs and hybrid models in experiments.
Models long-range dependencies effectively.
Achieves implicit positional encoding.
Abstract
Transformers have shown great potential in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention. The oversized convolution employs a kernel with twice the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolution kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
ParCNetV2: Oversized Kernel with Enhanced Attention· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Human Pose and Action Recognition
MethodsConvolution
