Adaptive teachers for amortized samplers
Minsu Kim, Sanghyeok Choi, Taeyoung Yun, Emmanuel Bengio, Leo Feng,, Jarrid Rector-Brooks, Sungsoo Ahn, Jinkyoo Park, Nikolay Malkin, Yoshua, Bengio

TL;DR
This paper introduces an adaptive teacher-student framework for amortized samplers that improves exploration, mode coverage, and sample efficiency in complex sampling tasks using reinforcement learning guided by an auxiliary teacher model.
Contribution
It proposes an adaptive training distribution via a teacher model to enhance exploration and mode coverage in amortized sampling, addressing challenges in efficient exploration.
Findings
Improved mode coverage in synthetic and biochemical tasks.
Enhanced sample efficiency in diffusion-based sampling.
Effective exploration in challenging environments.
Abstract
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the \teacher) to guide the training of the primary amortized sampler (the \student). The \teacher, an auxiliary behavior model, is trained to sample high-loss regions of the \student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate…
Peer Reviews
Decision·ICLR 2025 Poster
1. The idea of exploring high-error region and increase the sampling probability of data in these region for training the student model intuitively makes sense to me, which is essentially resembles to hard-negative mining in the classic machine learning literature. 2. The experiments were well-executed and supports the main claim in the paper. 3. The math on GFlowNets and their connection to amortized inference is helpful, especially helps contextualize the significance of the contributions.
1. The idea is not new; it closely resembles hard negative mining (i.e., sampling negative examples where the model shows high error), which limits the novelty of the proposed approach. 2. While the idea of sampling more in high-error regions seems intuitively reasonable, its effectiveness may depend on whether the student model has sufficient capacity to fit the distribution. Also, I would like to see more comparisons and discussion with the active learning literature, such as uncertainty samp
**Strengths** : - Addresses an important problem: Mode coverage/exploration is an important problem in the training of GFlowNets. The paper proposes a novel and interesting solution to the problem. - Very Well written : I really enjoyed reading the paper. The paper did an excellent job of introducing and walking through the relevant literature and the methods and putting itself in context. Although I was not myself very familiar with the specific work line of work around GFlowNets, I was easi
**Weaknesses/Questions** - The introduction of an adaptive Teacher adds additional complexity to the training process, requiring the joint optimization of both Teacher and Student networks. At least in the RL literature, these types of exploration methods were tried and given up on as they required extensive tuning and didn't scale well enough. I'm curious how the authors think that compares with the use cases here and if the authors genuinely believe the results shown in the paper will hold th
I can easily follow this work, and this work tries to amortize prediction by simply conditionally inputting some variables. Overall, (1) I find this work easy to follow with clear motivations. Decision-making for amortized inference, particularly the development of GFlowNets, is impactful, and this work focuses on an important issue, namely efficient exploration under RL frameworks in the field. (2) The developed strategy is novel and practical in implementation. (3) The experiments are inspiri
While a lot of merits in this work, I find some parts are necessary to modify or revise. --- (1) It seems to lack the necessity of amortized inference. In line28-30, it states the mechanism of amortized inference and related bottleneck. It is necessary to include the role of amortized inference compared with traditional methods such as MCMC, e.g., citing [1] and adding something like "The amortized inference adopts a shared inference module for all data points instead of performing inference
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies
