Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Hanpeng Liu; Zidan Wang; Shuoxi Zhang; Kaiyuan Gao; Kun He

arXiv:2603.03806·cs.CV·March 5, 2026

Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Hanpeng Liu, Zidan Wang, Shuoxi Zhang, Kaiyuan Gao, Kun He

PDF

Open Access

TL;DR

This paper introduces STAR, a separator-based method that significantly extends the sequence length for Vision Mamba pretraining, leading to improved accuracy on ImageNet-1k by better capturing long-range dependencies.

Contribution

The paper proposes a novel separator technique called STAR that enables longer sequence pretraining for Vision Mamba, enhancing its ability to model extended visual data.

Findings

01

STAR allows quadrupling of input sequence length.

02

STAR-B achieved 83.5% accuracy on ImageNet-1k.

03

The method improves long-range dependency modeling in vision models.

Abstract

The state space model Mamba has recently emerged as a promising paradigm in computer vision, attracting significant attention due to its efficient processing of long sequence tasks. Mamba's inherent causal mechanism renders it particularly suitable for autoregressive pretraining. However, current autoregressive pretraining methods are constrained to short sequence tasks, failing to fully exploit Mamba's prowess in handling extended sequences. To address this limitation, we introduce an innovative autoregressive pretraining method for Vision Mamba that substantially extends the input sequence length. We introduce new \textbf{S}epara\textbf{T}ors for \textbf{A}uto\textbf{R}egressive pretraining to demarcate and differentiate between different images, known as \textbf{STAR}. Specifically, we insert identical separators before each image to demarcate its inception. This strategy enables us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning