FractalMamba++: Scaling Vision Mamba Across Resolutions via Hilbert Fractal Geometry
Bo Li, Haoke Xiao, Lv Tang

TL;DR
FractalMamba++ introduces a resolution-scalable vision backbone using Hilbert fractal geometry to better preserve spatial locality and improve performance across various vision tasks and resolutions.
Contribution
It proposes a novel serialization, skip connection, and position encoding scheme based on Hilbert curves for scalable and spatially faithful vision models.
Findings
Improves performance on ImageNet-1K classification.
Enhances detection and segmentation on COCO and ADE20K datasets.
Achieves better results with high-resolution inputs.
Abstract
Vision Mamba offers linear complexity for long visual sequences, yet its performance depends critically on how a two-dimensional patch grid is serialized into a one-dimensional state-space recurrence. Raster-style scans disrupt spatial continuity, and the mismatch between 2D locality and 1D state propagation becomes increasingly severe when the inference resolution grows beyond the training grid. This paper presents FractalMamba++, a resolution-scalable vision backbone organized around a single geometric principle: the recursive self-similar structure of the Hilbert curve determines how patches are serialized, where long-range state shortcuts are inserted, and how positional relations are encoded. First, Hilbert-curve-based Fractal Serialization preserves local 2D neighborhoods more faithfully than linear scans and provides consistent neighborhood statistics across resolutions. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
