PVG: Progressive Vision Graph for Vision Recognition

Jiafu Wu; Jian Li; Jiangning Zhang; Boshen Zhang; Mingmin Chi; Yabiao; Wang; Chengjie Wang

arXiv:2308.00574·cs.CV·December 11, 2024

PVG: Progressive Vision Graph for Vision Recognition

Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao, Wang, Chengjie Wang

PDF

1 Repo

TL;DR

The paper introduces PVG, a novel graph-based vision recognition architecture that improves irregular object capturing, reduces over-smoothing, and outperforms state-of-the-art models on benchmarks.

Contribution

PVG presents a progressive graph construction, neighbor information aggregation with MaxE, and GraphLU activation, addressing key limitations of existing vision GNNs.

Findings

01

PVG-S achieves 83.0% Top-1 accuracy on ImageNet-1K.

02

PVG-B surpasses ViG-B by 0.5% accuracy.

03

PVG improves object detection metrics on COCO dataset.

Abstract

Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wujiafu007/pvg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMax Pooling