Random Policy Evaluation Uncovers Policies of Generative Flow Networks
Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

TL;DR
This paper reveals a fundamental connection between Generative Flow Networks and policy evaluation in reinforcement learning, introducing a simple random policy evaluation method that achieves reward matching and competitive results.
Contribution
It uncovers a novel link between GFlowNets and RL policy evaluation, proposing a rectified random policy evaluation algorithm that simplifies implementation and enhances understanding.
Findings
RPE achieves reward matching similar to GFlowNets.
Empirical results show RPE's competitive performance.
Connection between GFlowNets and RL policy evaluation is established.
Abstract
The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects proportionally to an unnormalized reward function. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which modifies the standard objective of RL agents by learning an entropy-regularized objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature. While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them can simplify their implementation through established RL principles and improve RL's diverse solution discovery capabilities. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one RL's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems
