GFlowNet Training by Policy Gradients

Puhua Niu; Shili Wu; Mingzhou Fan; Xiaoning Qian

arXiv:2408.05885·cs.LG·June 4, 2025

GFlowNet Training by Policy Gradients

Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a new policy-gradient training framework for Generative Flow Networks (GFlowNets), enhancing their ability to generate combinatorial objects by integrating reinforcement learning principles.

Contribution

It develops a novel policy-based training method for GFlowNets with theoretical guarantees, bridging flow balance with reward optimization, and jointly training forward and backward policies.

Findings

01

Policy-based GFlowNet training improves performance.

02

Joint training of forward and backward policies enhances efficiency.

03

Experimental results show robustness and better gradient estimation.

Abstract

Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This work discusses an interesting direction of training GFlowNets and using learning from RL policy-based methods to introduce a policy-based GFlowNet. 2. The discussion and perspective around backward trajectories is interesting and a useful way to improve GFlowNet training. 3. The gradient equivalence discussion and analysis is useful to understand the theoretical claims and some of the motivation behind this work.

Weaknesses

1. The related work section could be made more exhaustive by adding the other GFlowNet losses and their references. 2. It will be useful to expand the number of environment configurations. For hypergrid, N=2 and N=3 are the only options used and using a higher value will help. 3. Previous work has analyzed number of states visited for hypergrid domain, which has not been included here. 4. Adding other GFlowNet based baselines, such as Detailed Balance, would be useful as it is an important and

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

**originality** - The main novelty of the paper is on formulating the GFlowNet problems as RL problems and allow the use of RL in these problems. The problem formulation and the training strategies proposed can be considered novel results. - The theoretical results are interesting and can be considered novel **quality** - The paper is overall well-written, with only minor isuses **clarity** - The paper is quite clear **significance** - The new problem formulation and training strategies disc

Weaknesses

Related work: - It seems to me the related work can be improved, for example, what is the most relevant standard RL algorithm? And how is your proposed policy gradient method different in design? Additionally, GFlowNet and RL are discussed together in some other papers in the literature, such as the "GFlowNet Foundations" by Bengio et al. How is the analysis in your work related to these previous works? Additional technical details: - Would be nice to have more technical details, for example,

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

This paper tries to connect the field of RL and GFlowNets, which forms an interesting problem. The experimental part is well-explained.

Weaknesses

The main concern for the paper is that the experimental evaluation part is too toy -- which focuses on some synthetic problems including hypergrid and bit sequence generation only, and the evaluation metric does not follow previous paper, and some of the claims are not well supported.

Code & Models

Repositories

niupuhua1234/gfn-pg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data Technologies and Applications · Cloud Computing and Resource Management