Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Hengshuang Zhao

TL;DR
Seg-VAR introduces a novel autoregressive framework for image segmentation that models masks as sequential latent predictions, outperforming previous methods and enabling new spatial reasoning capabilities.
Contribution
It redefines segmentation as a conditional autoregressive mask generation task using latent learning, integrating multi-stage training and spatial-aware encoding.
Findings
Outperforms previous discriminative and generative segmentation methods
Effective in various segmentation tasks and benchmarks
Introduces a hierarchical autoregressive approach for spatial-aware vision
Abstract
While visual autoregressive modeling (VAR) strategies have shed light on image generation with the autoregressive models, their potential for segmentation, a task that requires precise low-level spatial perception, remains unexplored. Inspired by the multi-scale modeling of classic Mask2Former-based models, we propose Seg-VAR, a novel framework that rethinks segmentation as a conditional autoregressive mask generation problem. This is achieved by replacing the discriminative learning with the latent learning process. Specifically, our method incorporates three core components: (1) an image encoder generating latent priors from input images, (2) a spatial-aware seglat (a latent expression of segmentation mask) encoder that maps segmentation masks into discrete latent tokens using a location-sensitive color mapping to distinguish instances, and (3) a decoder reconstructing masks from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Neural Network Applications
