ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer
Haokui Zhang, Wenze Hu, Xiaoyu Wang

TL;DR
ParC-Net is a novel ConvNet backbone that integrates vision transformer merits through position aware circular convolution, achieving superior accuracy, fewer parameters, and faster inference on various vision tasks.
Contribution
It introduces ParC, a lightweight, position-sensitive circular convolution, and a meta-former block that combines convolution and attention, enhancing ConvNets with transformer-like capabilities.
Findings
ParC-Net outperforms lightweight ConvNets and vision transformers in accuracy and speed.
Achieves 78.6% top-1 accuracy on ImageNet-1k with fewer parameters.
Demonstrates superior performance on object detection and segmentation tasks.
Abstract
Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
MethodsAttention Is All You Need · MobileViT · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Attentive Walk-Aggregating Graph Neural Network · Softmax · Dense Connections · Multi-Head Attention · Residual Connection · Layer Normalization
