ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Shelly Golan; Michael Finkelson; Ariel Bereslavsky; Yotam Nitzan; Or Patashnik

arXiv:2604.20816·cs.LG·April 23, 2026

ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control

Shelly Golan, Michael Finkelson, Ariel Bereslavsky, Yotam Nitzan, Or Patashnik

PDF

TL;DR

ParetoSlider is a novel framework that trains a single diffusion model to approximate the entire Pareto front, enabling inference-time control over multiple conflicting generative goals without retraining.

Contribution

It introduces a multi-objective RL approach that conditions diffusion models on preference weights, allowing dynamic trade-off navigation at inference time.

Findings

01

Single model matches or exceeds fixed-trade-off baselines.

02

Provides fine-grained control over conflicting goals.

03

Works across multiple state-of-the-art backbones.

Abstract

Reinforcement Learning (RL) post-training has become the standard for aligning generative models with human preferences, yet most methods rely on a single scalar reward. When multiple criteria matter, the prevailing practice of ``early scalarization'' collapses rewards into a fixed weighted sum. This commits the model to a single trade-off point at training time, providing no inference-time control over inherently conflicting goals -- such as prompt adherence versus source fidelity in image editing. We introduce ParetoSlider, a multi-objective RL (MORL) framework that trains a single diffusion model to approximate the entire Pareto front. By training the model with continuously varying preference weights as a conditioning signal, we enable users to navigate optimal trade-offs at inference time without retraining or maintaining multiple checkpoints. We evaluate ParetoSlider across three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.