$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data
Jake Fawkes, Jason Hartford

TL;DR
This paper introduces a family of $f$-divergence-based loss functions for training generative models, enabling effective off-policy and on-policy tuning with desirable properties like mode coverage.
Contribution
It extends the mean square error loss to the entire $f$-divergence family, providing new surrogate losses that work well off-policy and inherit divergence-specific properties.
Findings
Loss functions retain divergence properties off-policy
Applicable to diverse models including LLMs and molecule discovery
Demonstrated effectiveness on synthetic, molecular, and language tasks
Abstract
In GFlowNets and variational inference, it has been shown that the mean square error between target and model log probabilities is an effective, low variance, surrogate loss for training generative models. This loss has the property that when evaluated \emph{on-policy} its gradients correspond to those of the KL divergence, while \emph{off-policy} it remains a valid loss with the same global minimizer. In this work, we demonstrate that this construction can be extended to the whole family of -divergences, leading to a family of losses whose on-policy gradients are that of the corresponding -divergence, but retain the same global minimizer off-policy. Specifically, we show that the on-policy gradients lead to a one to one correspondence between translation invariant loss functions on the target and model log probabilities, and -divergences. This equivalence allows us to design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
