DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation

Xin Jiang; Jingwen Chen; Yehao Li; Yingwei Pan; Kezhou Chen; Zechao Li; Ting Yao; Tao Mei

arXiv:2601.22507·cs.CV·February 2, 2026

DreamVAR: Taming Reinforced Visual Autoregressive Model for High-Fidelity Subject-Driven Image Generation

Xin Jiang, Jingwen Chen, Yehao Li, Yingwei Pan, Kezhou Chen, Zechao Li, Ting Yao, Tao Mei

PDF

Open Access

TL;DR

DreamVAR introduces a novel visual autoregressive model for subject-driven image generation, leveraging multi-scale features and reinforcement learning to improve image quality and subject consistency over existing diffusion methods.

Contribution

The paper presents DreamVAR, a new VAR-based framework that simplifies autoregressive dependencies and enhances subject fidelity using reinforcement learning.

Findings

01

Outperforms diffusion models in appearance preservation.

02

Simplifies autoregressive dependencies with pre-filled subject features.

03

Uses reinforcement learning to improve semantic alignment.

Abstract

Recent advances in subject-driven image generation using diffusion models have attracted considerable attention for their remarkable capabilities in producing high-quality images. Nevertheless, the potential of Visual Autoregressive (VAR) models, despite their unified architecture and efficient inference, remains underexplored. In this work, we present DreamVAR, a novel framework for subject-driven image synthesis built upon a VAR model that employs next-scale prediction. Technically, multi-scale features of the reference subject are first extracted by a visual tokenizer. Instead of interleaving these conditional features with target image tokens across scales, our DreamVAR pre-fills the full subject feature sequence prior to predicting target image tokens. This design simplifies autoregressive dependencies and mitigates the train-test discrepancy in multi-scale conditioning scenario…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Image Enhancement Techniques