$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data

Jake Fawkes; Jason Hartford

arXiv:2605.15417·cs.LG·May 18, 2026

$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data

Jake Fawkes, Jason Hartford

PDF

TL;DR

This paper introduces a family of $f$-divergence-based loss functions for training generative models, enabling effective off-policy and on-policy tuning with desirable properties like mode coverage.

Contribution

It extends the mean square error loss to the entire $f$-divergence family, providing new surrogate losses that work well off-policy and inherit divergence-specific properties.

Findings

01

Loss functions retain divergence properties off-policy

02

Applicable to diverse models including LLMs and molecule discovery

03

Demonstrated effectiveness on synthetic, molecular, and language tasks

Abstract

In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models. This loss has the property that when evaluated \emph{on-policy} its gradients correspond to those of the KL divergence, while \emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of $f$ -divergences, leading to a family of losses whose on-policy gradients are that of the corresponding $f$ -divergence, but retain the same global minimizer off-policy. Specifically, we show that the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and $f$ -divergences. This equivalence allows us to design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.