Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment
Huayu Chen, Hang Su, Peize Sun, Jun Zhu

TL;DR
This paper introduces Condition Contrastive Alignment (CCA), a guidance-free method for autoregressive visual generation that fine-tunes pretrained models to match target distributions, reducing reliance on guidance techniques and improving efficiency.
Contribution
CCA provides a guidance-free approach for AR visual generation, unifying language and visual alignment methods, and significantly reduces sampling costs with minimal fine-tuning.
Findings
CCA enhances guidance-free performance with just one epoch of fine-tuning.
CCA achieves results comparable to guided sampling methods.
Adjustable training parameters allow trade-offs between diversity and fidelity.
Abstract
Classifier-Free Guidance (CFG) is a critical technique for enhancing the sample quality of visual generative models. However, in autoregressive (AR) multi-modal generation, CFG introduces design inconsistencies between language and visual content, contradicting the design philosophy of unifying different modalities for visual AR. Motivated by language model alignment methods, we propose \textit{Condition Contrastive Alignment} (CCA) to facilitate guidance-free AR visual generation with high performance and analyze its theoretical connection with guided sampling methods. Unlike guidance methods that alter the sampling process to achieve the ideal sampling distribution, CCA directly fine-tunes pretrained models to fit the same distribution target. Experimental results show that CCA can significantly enhance the guidance-free performance of all tested models with just one epoch of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
