TL;DR
The paper introduces PVG, a novel graph-based vision recognition architecture that improves irregular object capturing, reduces over-smoothing, and outperforms state-of-the-art models on benchmarks.
Contribution
PVG presents a progressive graph construction, neighbor information aggregation with MaxE, and GraphLU activation, addressing key limitations of existing vision GNNs.
Findings
PVG-S achieves 83.0% Top-1 accuracy on ImageNet-1K.
PVG-B surpasses ViG-B by 0.5% accuracy.
PVG improves object detection metrics on COCO dataset.
Abstract
Convolution-based and Transformer-based vision backbone networks process images into the grid or sequence structures, respectively, which are inflexible for capturing irregular objects. Though Vision GNN (ViG) adopts graph-level features for complex images, it has some issues, such as inaccurate neighbor node selection, expensive node information aggregation calculation, and over-smoothing in the deep layers. To address the above problems, we propose a Progressive Vision Graph (PVG) architecture for vision recognition task. Compared with previous works, PVG contains three main components: 1) Progressively Separated Graph Construction (PSGC) to introduce second-order similarity by gradually increasing the channel of the global graph branch and decreasing the channel of local branch as the layer deepens; 2) Neighbor nodes information aggregation and update module by using Max pooling and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMax Pooling
