StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling

Liqi Jing; Dingming Zhang; Peinian Li; Lichen Zhu; Yang Xu; Hanyu Xing

arXiv:2604.21052·cs.CV·May 13, 2026

StyleVAR: Controllable Image Style Transfer via Visual Autoregressive Modeling

Liqi Jing, Dingming Zhang, Peinian Li, Lichen Zhu, Yang Xu, Hanyu Xing

PDF

1 Models

TL;DR

StyleVAR introduces a novel autoregressive framework for image style transfer that models style and content in a learned latent space, achieving superior results across multiple benchmarks.

Contribution

The paper proposes a new autoregressive modeling approach with a blended cross-attention mechanism for controllable style transfer, trained with reinforcement fine-tuning for improved perceptual quality.

Findings

01

Outperforms AdaIN baseline on Style Loss, Content Loss, LPIPS, SSIM, DreamSim, and CLIP metrics.

02

Reinforcement fine-tuning with GRPO improves perceptual alignment.

03

Effective in transferring textures while preserving semantic structure, especially in landscapes and architecture.

Abstract

We build on the Visual Autoregressive Modeling (VAR) framework and formulate style transfer as conditional discrete sequence modeling in a learned latent space. Images are decomposed into multi-scale representations and tokenized into discrete codes by a VQ-VAE; a transformer then autoregressively models the distribution of target tokens conditioned on style and content tokens. To inject style and content information, we introduce a blended cross-attention mechanism in which the evolving target representation attends to its own history, while style and content features act as queries that decide which aspects of this history to emphasize. A scale-dependent blending coefficient controls the relative influence of style and content at each stage, encouraging the synthesized representation to align with both the content structure and the style texture without breaking the autoregressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Senfier-LiqiJing/StyleVAR
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.