Order-Preserving GFlowNets
Yihang Chen, Lukas Mauch

TL;DR
Order-Preserving GFlowNets (OP-GFNs) enable sampling based on a learned reward consistent with candidate orderings, improving efficiency and flexibility in multi-objective and single-objective optimization tasks.
Contribution
This paper introduces OP-GFNs, a novel method that learns reward functions aligned with candidate orderings, removing the need for explicit reward formulations and enhancing optimization performance.
Findings
OP-GFNs achieve state-of-the-art results in single-objective maximization.
OP-GFNs effectively approximate Pareto fronts in multi-objective tasks.
The training process sparsifies the reward landscape, focusing on higher-ranked candidates.
Abstract
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates with probabilities proportional to a given reward. However, GFlowNets can only be used with a predefined scalar reward, which can be either computationally expensive or not directly accessible, in the case of multi-objective optimization (MOO) tasks for example. Moreover, to prioritize identifying high-reward candidates, the conventional practice is to raise the reward to a higher exponent, the optimal choice of which may vary across different environments. To address these issues, we propose Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in proportion to a learned reward function that is consistent with a provided (partial) order on the candidates, thus eliminating the need for an explicit formulation of the reward function. We theoretically prove that the…
Peer Reviews
Decision·ICLR 2024 poster
1. The paper introduces an important extension of the GFlowNets for multi-objective optimization when D > 1 objectives need to be optimized. 2. The work also discusses how an efficient utilization of the GFlowNet policy can be achieved in difficult to explore settings. 3. The theoretical results and analysis are useful to understand the proposed method and its advantages. 4. The work also provides a good overview of the literature to benefit the reader.
1. The experiments section can be expanded to include more difficult environments. For example, for hypergrid,higher values of H and N can be tested as larger grids will help analyzing the exploration problems better. 2. Detailed balance objective can perform reasonably well in many settings. It will be beneficial to include it in all methods and numbers reported. 3. It will be useful to add standard deviation and error bars across experiments. It will also be useful to better understand the var
The paper conducts extensive experiments under different objective settings and domains. Especially, the paper conducts experiments on NAS benchmark, which is a first attempt to apply GFlowNets into NAS while it is natural as we can make neural architecture by adding operations in a sequence manner. It also achieves superior results compared to other baselines in NAS benchmark. The paper also tackles multi-objective problems with a non-convex Pareto front, which is hard to solve with prior mult
1. There is a possibility of encountering non-stationarity issues when jointly training GFlowNets and the reward function. It might be worth exploring alternative training strategies to mitigate this potential challenge. 2. Experimental results are not that persuasive, having little improvement over baselines. For example, this work just compares with simple GFN baseline in molecular tasks, more competitive baselines (e.g., subTB, FL-GFN, RL methods) are needed. --- **Discussion needed rega
The idea proposed in the paper is solid and the execution is well done; the method is tested on a whole variety of tasks and relevant setups. The idea certainly relates to other rank-based methods, such as those in RL & search, but stands on its own in the GFlowNet framework.
My biggest criticism of the paper is really its presentation. It's really not clear what the algorithm actually is, readers have to go all the way into the appendix to find it, and even there some questions remain, are $\mathcal{T}$ and $\mathcal{D}$ distinct? What is $\hat R$ trained on? Does it have distinct parameters? Shared? etc. From scrolling through the appendix, it appears that there are, understandably, a number of tricks that can be used. Most of them have some form of ablation in
Code & Models
Videos
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Machine Learning in Materials Science
