DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong

TL;DR
DyMO is a training-free, plug-and-play method that improves text-to-image diffusion model alignment with human preferences by dynamically scheduling multiple objectives during inference.
Contribution
It introduces a novel dynamic scheduling approach for multiple objectives and a semantic alignment objective, enhancing inference-time alignment without additional training.
Findings
Effective across diverse diffusion models
Robust in various evaluation metrics
Improves semantic and human preference alignment
Abstract
Text-to-image diffusion model alignment is critical for improving the alignment between the generated images and human preferences. While training-based methods are constrained by high computational costs and dataset requirements, training-free alignment methods remain underexplored and are often limited by inaccurate guidance. We propose a plug-and-play training-free alignment method, DyMO, for aligning the generated images and human preferences during inference. Apart from text-aware human preference scores, we introduce a semantic alignment objective for enhancing the semantic alignment in the early stages of diffusion, relying on the fact that the attention maps are effective reflections of the semantics in noisy images. We propose dynamic scheduling of multiple objectives and intermediate recurrent steps to reflect the requirements at different steps. Experiments with diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Machine Learning and Data Classification · Scheduling and Timetabling Solutions
MethodsSoftmax · Attention Is All You Need · Diffusion
