Pessimistic Backward Policy for GFlowNets
Hyosoon Jang, Yunhui Jang, Minsu Kim, Jinkyoo Park, and Sungsoo Ahn

TL;DR
This paper introduces PBP-GFN, a pessimistic backward policy for GFlowNets that improves sampling of high-reward objects by addressing under-exploitation issues, and demonstrates superior performance across diverse benchmarks.
Contribution
The paper proposes a novel pessimistic backward policy for GFlowNets that enhances high-reward object discovery and maintains diversity, outperforming existing methods.
Findings
PBP-GFN outperforms existing methods on multiple benchmarks.
It improves the discovery rate of high-reward objects.
It maintains diversity in generated objects.
Abstract
This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward value. In response to this challenge, we propose a pessimistic backward policy for GFlowNets (PBP-GFN), which maximizes the observed flow to align closely with the true reward for the object. We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In particular, PBP-GFN enhances the discovery of high-reward objects, maintains the diversity of the objects, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsParallel Computing and Optimization Techniques · Caching and Content Delivery · Opportunistic and Delay-Tolerant Networks
MethodsALIGN · Sparse Evolutionary Training
