Discrete Probabilistic Inference as Control in Multi-path Environments
Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua, Bengio

TL;DR
This paper explores how to use and improve probabilistic inference methods like GFlowNets and MaxEnt RL for sampling from complex discrete distributions, ensuring unbiased and reward-proportional sampling.
Contribution
It introduces reward correction techniques that align MaxEnt RL with GFlowNet objectives, guaranteeing unbiased sampling regardless of MDP structure.
Findings
Reward correction ensures unbiased, reward-proportional sampling.
Flow-matching objectives are equivalent to certain MaxEnt RL algorithms.
Empirical results demonstrate improved sampling performance.
Abstract
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
