ParC-Net: Position Aware Circular Convolution with Merits from ConvNets   and Transformer

Haokui Zhang; Wenze Hu; Xiaoyu Wang

arXiv:2203.03952·cs.CV·July 27, 2022·6 cites

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

Haokui Zhang, Wenze Hu, Xiaoyu Wang

PDF

Open Access 3 Repos

TL;DR

ParC-Net is a novel ConvNet backbone that integrates vision transformer merits through position aware circular convolution, achieving superior accuracy, fewer parameters, and faster inference on various vision tasks.

Contribution

It introduces ParC, a lightweight, position-sensitive circular convolution, and a meta-former block that combines convolution and attention, enhancing ConvNets with transformer-like capabilities.

Findings

01

ParC-Net outperforms lightweight ConvNets and vision transformers in accuracy and speed.

02

Achieves 78.6% top-1 accuracy on ImageNet-1k with fewer parameters.

03

Demonstrates superior performance on object detection and segmentation tasks.

Abstract

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications

MethodsAttention Is All You Need · MobileViT · Linear Layer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Attentive Walk-Aggregating Graph Neural Network · Softmax · Dense Connections · Multi-Head Attention · Residual Connection · Layer Normalization