MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Canyu Zhao; Hao Chen; Yunze Tong; Yu Qiao; Jiacheng Li; Chunhua Shen

arXiv:2605.06507·cs.CV·May 8, 2026

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Canyu Zhao, Hao Chen, Yunze Tong, Yu Qiao, Jiacheng Li, Chunhua Shen

PDF

1 Repo

TL;DR

MARBLE introduces a gradient-space optimization framework for multi-reward diffusion model fine-tuning, effectively balancing multiple evaluation criteria without manual reward weighting.

Contribution

It proposes a novel method that maintains independent advantage estimators and solves a quadratic programming problem for unified reward optimization.

Findings

01

MARBLE improves all five reward dimensions simultaneously.

02

It stabilizes gradients for poorly aligned rewards.

03

Training speed is comparable to baseline methods.

Abstract

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple rewards by training one specialist model per reward, optimizing a weighted-sum reward $R (x) = \sum_{k} w_{k} R_{k} (x)$ , or sequentially fine-tuning with a hand-crafted stage schedule. These approaches either fail to produce a unified model that can be jointly trained on all rewards or necessitates heavy manually tuned sequential training. We find that the failure stems from using a naive weighted-sum reward aggregation. This approach suffers from a sample-level mismatch because most rollouts are specialist samples, highly informative for certain reward dimensions but irrelevant for others;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aim-uofa/MARBLE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.