Toward Guidance-Free AR Visual Generation via Condition Contrastive   Alignment

Huayu Chen; Hang Su; Peize Sun; Jun Zhu

arXiv:2410.09347·cs.CV·October 15, 2024

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

Huayu Chen, Hang Su, Peize Sun, Jun Zhu

PDF

Open Access 2 Repos 2 Models

TL;DR

This paper introduces Condition Contrastive Alignment (CCA), a guidance-free method for autoregressive visual generation that fine-tunes pretrained models to match target distributions, reducing reliance on guidance techniques and improving efficiency.

Contribution

CCA provides a guidance-free approach for AR visual generation, unifying language and visual alignment methods, and significantly reduces sampling costs with minimal fine-tuning.

Findings

01

CCA enhances guidance-free performance with just one epoch of fine-tuning.

02

CCA achieves results comparable to guided sampling methods.

03

Adjustable training parameters allow trade-offs between diversity and fidelity.

Abstract

Classifier-Free Guidance (CFG) is a critical technique for enhancing the sample quality of visual generative models. However, in autoregressive (AR) multi-modal generation, CFG introduces design inconsistencies between language and visual content, contradicting the design philosophy of unifying different modalities for visual AR. Motivated by language model alignment methods, we propose \textit{Condition Contrastive Alignment} (CCA) to facilitate guidance-free AR visual generation with high performance and analyze its theoretical connection with guided sampling methods. Unlike guidance methods that alter the sampling process to achieve the ideal sampling distribution, CCA directly fine-tunes pretrained models to fit the same distribution target. Experimental results show that CCA can significantly enhance the guidance-free performance of all tested models with just one epoch of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization