Can Graphs Help Vision SSMs See Better?
Dhruv Parikh, Anvitha Ramachandran, Haoyang Fan, Mustafa Munir, Rajgopal Kannan, Viktor Prasanna

TL;DR
GraphScan introduces a graph-based dynamic scanning operator for Vision SSMs, improving local feature exchange and achieving state-of-the-art results across multiple vision tasks with modest overhead.
Contribution
The paper proposes GraphScan, a novel graph-induced dynamic scanning operator that enhances Vision SSMs by explicitly modeling local semantic interactions before global aggregation.
Findings
GraphScan achieves state-of-the-art performance on vision tasks.
GraphScan induces interpretable displacement fields.
GraphScan maintains linear scaling with image size.
Abstract
Vision state space models inherit the efficiency and long-range modeling ability of Mamba-style selective scans. However, their performance depends critically on the representation of two-dimensional visual features as one-dimensional token sequences. Existing scan operators range from predefined geometric traversals to dynamic coordinate-based samplers that reroute tokens through predicted offsets and interpolation. While effective, these mechanisms primarily adapt paths or sampling locations, rather than explicitly modeling which local patches should exchange information before global state-space mixing. This motivates a simple question: \emph{can graphs help vision state space models see better?} We introduce \textbf{GraphScan}, a graph-induced dynamic scanning operator for Vision SSMs. For each token, GraphScan constructs a spatially bounded local graph, learns feature-conditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
