SCALAR: Scale-wise Controllable Visual Autoregressive Learning

Ryan Xu; Dongyang Jin; Yancheng Bai; Rui Lan; Xu Duan; Lei Sun; and Xiangxiang Chu

arXiv:2507.19946·cs.CV·November 18, 2025

SCALAR: Scale-wise Controllable Visual Autoregressive Learning

Ryan Xu, Dongyang Jin, Yancheng Bai, Rui Lan, Xu Duan, Lei Sun, and Xiangxiang Chu

PDF

Open Access 1 Video

TL;DR

SCALAR introduces a novel scale-wise conditional decoding method for visual autoregressive models, enabling fine-grained, efficient, and high-quality controllable image synthesis with multi-modal guidance.

Contribution

It proposes a new scale-wise control mechanism and a unified model for multi-modal guidance in VAR-based image generation, improving control and fidelity.

Findings

01

Achieves superior control precision in image synthesis

02

Demonstrates improved generation quality over existing methods

03

Supports flexible multi-conditional guidance

Abstract

Controllable image synthesis, which enables fine-grained control over generated outputs, has emerged as a key focus in visual generative modeling. However, controllable generation remains challenging for Visual Autoregressive (VAR) models due to their hierarchical, next-scale prediction style. Existing VAR-based methods often suffer from inefficient control encoding and disruptive injection mechanisms that compromise both fidelity and efficiency. In this work, we present SCALAR, a controllable generation method based on VAR, incorporating a novel Scale-wise Conditional Decoding mechanism. SCALAR leverages a pretrained image encoder to extract semantic control signal encodings, which are projected into scale-specific representations and injected into the corresponding layers of the VAR backbone. This design provides persistent and structurally aligned guidance throughout the generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SCALAR: Scale-wise Controllable Visual Autoregressive Learning· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning