ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
Shelly Golan, Michael Finkelson, Ariel Bereslavsky, Yotam Nitzan, Or Patashnik

TL;DR
ParetoSlider is a novel framework that trains a single diffusion model to approximate the entire Pareto front, enabling inference-time control over multiple conflicting generative goals without retraining.
Contribution
It introduces a multi-objective RL approach that conditions diffusion models on preference weights, allowing dynamic trade-off navigation at inference time.
Findings
Single model matches or exceeds fixed-trade-off baselines.
Provides fine-grained control over conflicting goals.
Works across multiple state-of-the-art backbones.
Abstract
Reinforcement Learning (RL) post-training has become the standard for aligning generative models with human preferences, yet most methods rely on a single scalar reward. When multiple criteria matter, the prevailing practice of ``early scalarization'' collapses rewards into a fixed weighted sum. This commits the model to a single trade-off point at training time, providing no inference-time control over inherently conflicting goals -- such as prompt adherence versus source fidelity in image editing. We introduce ParetoSlider, a multi-objective RL (MORL) framework that trains a single diffusion model to approximate the entire Pareto front. By training the model with continuously varying preference weights as a conditioning signal, we enable users to navigate optimal trade-offs at inference time without retraining or maintaining multiple checkpoints. We evaluate ParetoSlider across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
