Can Graphs Help Vision SSMs See Better?

Dhruv Parikh; Anvitha Ramachandran; Haoyang Fan; Mustafa Munir; Rajgopal Kannan; Viktor Prasanna

arXiv:2605.11300·cs.CV·May 13, 2026

Can Graphs Help Vision SSMs See Better?

Dhruv Parikh, Anvitha Ramachandran, Haoyang Fan, Mustafa Munir, Rajgopal Kannan, Viktor Prasanna

PDF

TL;DR

GraphScan introduces a graph-based dynamic scanning operator for Vision SSMs, improving local feature exchange and achieving state-of-the-art results across multiple vision tasks with modest overhead.

Contribution

The paper proposes GraphScan, a novel graph-induced dynamic scanning operator that enhances Vision SSMs by explicitly modeling local semantic interactions before global aggregation.

Findings

01

GraphScan achieves state-of-the-art performance on vision tasks.

02

GraphScan induces interpretable displacement fields.

03

GraphScan maintains linear scaling with image size.

Abstract

Vision state space models inherit the efficiency and long-range modeling ability of Mamba-style selective scans. However, their performance depends critically on the representation of two-dimensional visual features as one-dimensional token sequences. Existing scan operators range from predefined geometric traversals to dynamic coordinate-based samplers that reroute tokens through predicted offsets and interpolation. While effective, these mechanisms primarily adapt paths or sampling locations, rather than explicitly modeling which local patches should exchange information before global state-space mixing. This motivates a simple question: \emph{can graphs help vision state space models see better?} We introduce \textbf{GraphScan}, a graph-induced dynamic scanning operator for Vision SSMs. For each token, GraphScan constructs a spatially bounded local graph, learns feature-conditioned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.