Discrete Probabilistic Inference as Control in Multi-path Environments

Tristan Deleu; Padideh Nouri; Nikolay Malkin; Doina Precup; Yoshua; Bengio

arXiv:2402.10309·cs.LG·May 29, 2024·1 cites

Discrete Probabilistic Inference as Control in Multi-path Environments

Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua, Bengio

PDF

Open Access 1 Repo

TL;DR

This paper explores how to use and improve probabilistic inference methods like GFlowNets and MaxEnt RL for sampling from complex discrete distributions, ensuring unbiased and reward-proportional sampling.

Contribution

It introduces reward correction techniques that align MaxEnt RL with GFlowNet objectives, guaranteeing unbiased sampling regardless of MDP structure.

Findings

01

Reward correction ensures unbiased, reward-proportional sampling.

02

Flow-matching objectives are equivalent to certain MaxEnt RL algorithms.

03

Empirical results demonstrate improved sampling performance.

Abstract

We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tristandeleu/gfn-maxent-rl
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries