Interpretable epistemic uncertainty decomposition in sequential generative models via polynomial chaos surrogates

Ram\'on Nartallo-Kaluarachchi; Shashanka Ubaru; Ma{\l}gorzata J Zimo\'n; Dongsung Huh; Robert Manson-Sawko; Lior Horesh; Yoshua Bengio

arXiv:2510.21523·cs.LG·May 19, 2026

Interpretable epistemic uncertainty decomposition in sequential generative models via polynomial chaos surrogates

Ram\'on Nartallo-Kaluarachchi, Shashanka Ubaru, Ma{\l}gorzata J Zimo\'n, Dongsung Huh, Robert Manson-Sawko, Lior Horesh, Yoshua Bengio

PDF

3 Reviews

TL;DR

This paper introduces a method to decompose and interpret epistemic uncertainty in sequential generative models using polynomial chaos surrogates, enabling insights into decision drivers and robust predictions.

Contribution

It presents a novel approach combining polynomial chaos expansions with generative flow networks to interpret uncertainty sources, with theoretical guarantees and real-world applications.

Findings

01

Reveals actionable structure in scientific discovery tasks.

02

Achieves high calibration coverage at 95% level.

03

Surrogate evaluates 10,000 policy samples milliseconds, vastly faster than retraining.

Abstract

Sequential generative models conditioned on uncertain rewards are central to AI-driven scientific discovery, yet the epistemic uncertainty they inherit from imperfect reward estimates remains unquantified. We propagate this uncertainty through generative flow networks (GFlowNets) by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The PCE coefficients yield analytical Sobol sensitivity indices, providing the first interpretable decomposition of which reward components drive which generative decisions, a capability unavailable from deep ensembles, Bayesian neural networks, or Monte Carlo dropout. Convergence guarantees are established theoretically and four of five are formally verified in the Lean 4 proof assistant. Across three real-world tasks the framework reveals actionable structure invisible to ensembles alone. On the Doyle-Dreher Buchwald-Hartwig…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

The paper addresses a clear point that Bayesian or Monte-Carlo ensembles for UQ in GFNs are computationally infeasible. The motivation of a surrogate-based hence is clear. The separation between epistemic uncertainty in rewards and randomness in training (SGD, initialization) is stated clearly. Experiments cover both discrete and continuous environments.

Weaknesses

1.Line 125 is misleading: the distribution is built for policies, integrating out reward functions. 2. Though the idea to approximate the mapping with polynomials is appealing, this mapping can be highly non-smooth, discontinuous, and multimodal. No argument or empirical check supports that a low-order polynomial expansion provides a valid approximation. 3. The analysis assumes (Y=f(X)) (the policy statistics) have finite second moments, but no proposition proves that for general GFlowNet traini

Reviewer 02Rating 2Confidence 3

Strengths

Up to my knowledge, the paper presents the first attempt at uncertainty quantification in GFlowNets, which is an interesting and important research direction. The presented methodology is highly novel in the context of GFlowNets.

Weaknesses

I am struggling to understand the method presented in Section 3.2, which is, in my opinion, the central part of the paper. Do I understand correctly that a separate PCE must be learnt for each pair (state, action) in the environment? Why is there a sum over actions in the loss in the Equation 6? Are coefficients $c_j$ separate across different states and actions, or must they be the same? Why doesn't the predictive model formally depend on the state? If a separate PCE must indeed be learnt for e

Reviewer 03Rating 4Confidence 4

Strengths

The paper is well-written and addresses clearly the problem of uncertainty quantification of GFlowNet-induced policy. The sources of the problem (e.g., the reward model trained from the data) are clear, and the motivation for the study is clear.

Weaknesses

1. While the paper is rather well written, I feel that its real applicability is rather limited, despite the claims of broader applicability (such as LLMs mentioned in conclusion). It is a classical statistical approach to aprpoximate the unknown mapping from rewards to policies with a polynomial basis, yet this method has its own drawbacks, as all methods in parametric statistics (the curse of dimensionality). This mapping can be non-smooth and complicated, especially in the claimed real-world

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.