Semantic Context Matters: Improving Conditioning for Autoregressive Models
Dongyang Jin, Ryan Xu, Jianhao Zeng, Rui Lan, Yancheng Bai, Lei Sun, Xiangxiang Chu

TL;DR
SCAR enhances autoregressive image models by incorporating semantic context through a compact prefix and alignment guidance, significantly improving instruction adherence and visual quality in image editing tasks.
Contribution
Introduces SCAR, a novel semantic-context-driven method that improves conditioning in autoregressive models for image editing, overcoming previous semantic limitations and high computational costs.
Findings
Outperforms prior AR-based methods in visual fidelity and semantic alignment
Achieves superior results on instruction editing and controllable generation benchmarks
Maintains flexibility across different autoregressive paradigms
Abstract
Recently, autoregressive (AR) models have shown strong potential in image generation, offering better scalability and easier integration with unified multi-modal systems compared to diffusion-based methods. However, extending AR models to general image editing remains challenging due to weak and inefficient conditioning, often leading to poor instruction adherence and visual artifacts. To address this, we propose SCAR, a Semantic-Context-driven method for Autoregressive models. SCAR introduces two key components: Compressed Semantic Prefilling, which encodes high-level semantics into a compact and efficient prefix, and Semantic Alignment Guidance, which aligns the last visual hidden states with target semantics during autoregressive decoding to enhance instruction fidelity. Unlike decoding-stage injection methods, SCAR builds upon the flexibility and generality of vector-quantized-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
