Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through $f$-divergence Minimization
Haoyuan Sun, Bo Xia, Yongzhe Chang, Xueqian Wang

TL;DR
This paper extends the alignment paradigm of text-to-image models from reverse Kullback-Leibler divergence to a broader class of $f$-divergences, improving alignment quality and diversity by analyzing different divergence constraints.
Contribution
It generalizes the alignment framework to include $f$-divergences, providing theoretical analysis and empirical evaluation of their effects on model alignment and diversity.
Findings
Jensen-Shannon divergence offers the best balance between alignment and diversity.
Different divergence choices significantly affect the trade-off between alignment quality and generation diversity.
The generalized formula enables more flexible and effective alignment strategies.
Abstract
Direct Preference Optimization (DPO) has recently expanded its successful application from aligning large language models (LLMs) to aligning text-to-image models with human preferences, which has generated considerable interest within the community. However, we have observed that these approaches rely solely on minimizing the reverse Kullback-Leibler divergence during alignment process between the fine-tuned model and the reference model, neglecting the incorporation of other divergence constraints. In this study, we focus on extending reverse Kullback-Leibler divergence in the alignment paradigm of text-to-image models to -divergence, which aims to garner better alignment performance as well as good generation diversity. We provide the generalized formula of the alignment paradigm under the -divergence condition and thoroughly analyze the impact of different divergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Digital Media and Visual Art
MethodsFocus
