ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models
Dmitrii Sorokin, Maksim Nakhodnov, Andrey Kuznetsov, Aibek Alanov

TL;DR
This paper introduces ImageReFL, a novel approach combining a new sampling strategy and fine-tuning method to enhance diversity and quality in human-aligned diffusion models, addressing the trade-off between alignment and diversity.
Contribution
It presents combined generation and ImageReFL, two techniques that improve diversity and quality in diffusion models aligned with human preferences, with minimal loss of global structure.
Findings
Outperforms conventional reward tuning on quality and diversity metrics
User study confirms better balance of human preference and visual diversity
Mitigates early-stage overfitting to preserve global structure
Abstract
Recent advances in diffusion models have led to impressive image generation capabilities, but aligning these models with human preferences remains challenging. Reward-based fine-tuning using models trained on human feedback improves alignment but often harms diversity, producing less varied outputs. In this work, we address this trade-off with two contributions. First, we introduce \textit{combined generation}, a novel sampling strategy that applies a reward-tuned diffusion model only in the later stages of the generation process, while preserving the base model for earlier steps. This approach mitigates early-stage overfitting and helps retain global structure and diversity. Second, we propose \textit{ImageReFL}, a fine-tuning method that improves image diversity with minimal loss in quality by training on real images and incorporating multiple regularizers, including diffusion and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection
