HERO: Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
Ayano Hiranaka, Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim, Naoki, Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji

TL;DR
HERO introduces an online human-feedback framework for fine-tuning diffusion models, significantly improving efficiency and effectiveness in tasks like anomaly correction, reasoning, and content moderation.
Contribution
HERO presents a novel online feedback-based approach for diffusion model fine-tuning, reducing reliance on large datasets and heuristic rewards.
Findings
4x more efficient in online feedback for anomaly correction
Effective with only 0.5K feedback for various tasks
Improves safety, alignment, and personalization
Abstract
Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward models built on large-scale datasets, limiting their applicability to scenarios where collecting such data is costly or difficult. To effectively and efficiently utilize human feedback, we develop a framework, HERO, which leverages online human feedback collected on the fly during model learning. Specifically, HERO features two key mechanisms: (1) Feedback-Aligned Representation Learning, an online training method that captures human feedback and provides informative learning signals for fine-tuning, and (2) Feedback-Guided Image Generation, which involves generating images from SD's refined initialization samples,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Elevator Systems and Control · Iterative Learning Control Systems
MethodsDiffusion
