Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri; Nicolas Boull\'e

arXiv:2502.01384·stat.ML·December 19, 2025

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Oussama Zekri, Nicolas Boull\'e

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces SEPO, a novel policy gradient algorithm designed for fine-tuning discrete diffusion models with non-differentiable rewards, demonstrating scalability and efficiency in various generative tasks.

Contribution

We propose a theoretically justified, efficient policy gradient method, SEPO, specifically tailored for fine-tuning discrete diffusion models with non-differentiable rewards.

Findings

01

SEPO effectively fine-tunes discrete diffusion models.

02

The method scales well across multiple tasks.

03

Experimental results show improved performance and efficiency.

Abstract

Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in Reinforcement Learning from Human Feedback (RLHF), remains a challenging task. We propose an efficient, broadly applicable, and theoretically justified policy gradient algorithm, called Score Entropy Policy Optimization (\SEPO), for fine-tuning discrete diffusion models over non-differentiable rewards. Our numerical experiments across several discrete generative tasks demonstrate the scalability and efficiency of our method. Our code is available at https://github.com/ozekri/SEPO.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ozekri/SEPO
pytorchOfficial

Models

🤗
Xssama/SEPO_DNA
model· ♡ 2
♡ 2

Videos

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods· slideslive

Taxonomy

TopicsClimate Change Policy and Economics

MethodsSoftmax · Attention Is All You Need · Diffusion