TL;DR
This paper introduces a method to decompose and interpret epistemic uncertainty in sequential generative models using polynomial chaos surrogates, enabling insights into decision drivers and robust predictions.
Contribution
It presents a novel approach combining polynomial chaos expansions with generative flow networks to interpret uncertainty sources, with theoretical guarantees and real-world applications.
Findings
Reveals actionable structure in scientific discovery tasks.
Achieves high calibration coverage at 95% level.
Surrogate evaluates 10,000 policy samples milliseconds, vastly faster than retraining.
Abstract
Sequential generative models conditioned on uncertain rewards are central to AI-driven scientific discovery, yet the epistemic uncertainty they inherit from imperfect reward estimates remains unquantified. We propagate this uncertainty through generative flow networks (GFlowNets) by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The PCE coefficients yield analytical Sobol sensitivity indices, providing the first interpretable decomposition of which reward components drive which generative decisions, a capability unavailable from deep ensembles, Bayesian neural networks, or Monte Carlo dropout. Convergence guarantees are established theoretically and four of five are formally verified in the Lean 4 proof assistant. Across three real-world tasks the framework reveals actionable structure invisible to ensembles alone. On the Doyle-Dreher Buchwald-Hartwig…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper addresses a clear point that Bayesian or Monte-Carlo ensembles for UQ in GFNs are computationally infeasible. The motivation of a surrogate-based hence is clear. The separation between epistemic uncertainty in rewards and randomness in training (SGD, initialization) is stated clearly. Experiments cover both discrete and continuous environments.
1.Line 125 is misleading: the distribution is built for policies, integrating out reward functions. 2. Though the idea to approximate the mapping with polynomials is appealing, this mapping can be highly non-smooth, discontinuous, and multimodal. No argument or empirical check supports that a low-order polynomial expansion provides a valid approximation. 3. The analysis assumes (Y=f(X)) (the policy statistics) have finite second moments, but no proposition proves that for general GFlowNet traini
Up to my knowledge, the paper presents the first attempt at uncertainty quantification in GFlowNets, which is an interesting and important research direction. The presented methodology is highly novel in the context of GFlowNets.
I am struggling to understand the method presented in Section 3.2, which is, in my opinion, the central part of the paper. Do I understand correctly that a separate PCE must be learnt for each pair (state, action) in the environment? Why is there a sum over actions in the loss in the Equation 6? Are coefficients $c_j$ separate across different states and actions, or must they be the same? Why doesn't the predictive model formally depend on the state? If a separate PCE must indeed be learnt for e
The paper is well-written and addresses clearly the problem of uncertainty quantification of GFlowNet-induced policy. The sources of the problem (e.g., the reward model trained from the data) are clear, and the motivation for the study is clear.
1. While the paper is rather well written, I feel that its real applicability is rather limited, despite the claims of broader applicability (such as LLMs mentioned in conclusion). It is a classical statistical approach to aprpoximate the unknown mapping from rewards to policies with a polynomial basis, yet this method has its own drawbacks, as all methods in parametric statistics (the curse of dimensionality). This mapping can be non-smooth and complicated, especially in the claimed real-world
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
