SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

You Qin; Linqing Wang; Hao Fei; Roger Zimmermann; Liefeng Bo; Qinglin Lu; Chunyu Wang

arXiv:2604.12617·cs.LG·April 20, 2026

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, Chunyu Wang

PDF

1 Repo

TL;DR

SOAR is a post-training method that improves diffusion models by self-correcting denoising trajectories, leading to better alignment and refinement without requiring reward signals.

Contribution

It introduces a bias-correction approach that enhances diffusion model training, bridging the gap between supervised fine-tuning and reinforcement learning.

Findings

01

SOAR improves GenEval scores from 0.70 to 0.78.

02

SOAR increases OCR accuracy from 0.64 to 0.67.

03

It surpasses Flow-GRPO in aesthetic and text-image alignment tasks.

Abstract

The post-training pipeline for diffusion models currently has two stages: supervised fine-tuning (SFT) on curated data and reinforcement learning (RL) with reward models. A fundamental gap separates them. SFT optimizes the denoiser only on ground-truth states sampled from the forward noising process; once inference deviates from these ideal states, subsequent denoising relies on out-of-distribution generalization rather than learned correction, exhibiting the same exposure bias that afflicts autoregressive models, but accumulated along the denoising trajectory instead of the token sequence. RL can in principle address this mismatch, yet its terminal reward signal is sparse, suffers from credit-assignment difficulty, and risks reward hacking. We propose SOAR (Self-Correction for Optimal Alignment and Refinement), a bias-correction post-training method that fills this gap. Starting from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-hunyuan/HY-SOAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.