Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data
Aayush Mishra, Daniel Habermann, Marvin Schmitt, Stefan T. Radev, Paul-Christian B\"urkner

TL;DR
This paper introduces a semi-supervised Bayesian inference method that uses unlabeled data and self-consistency losses to improve robustness and accuracy, especially on out-of-distribution observations.
Contribution
It proposes a novel semi-supervised approach leveraging self-consistency losses for robust amortized Bayesian inference on unlabeled and real-world data.
Findings
Significantly improves robustness of Bayesian inference on out-of-distribution data
Maintains high accuracy on real-world high-dimensional time-series and image data
Outperforms traditional methods in safety-critical applications
Abstract
Amortized Bayesian inference (ABI) with neural networks can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, ABI is not yet sufficiently robust for widespread and safe application. When performing inference on observations outside the scope of the simulated training data, posterior approximations are likely to become highly biased, which cannot be corrected by additional simulations due to the bad pre-asymptotic behavior of current neural posterior estimators. In this paper, we propose a semi-supervised approach that enables training not only on labeled simulated data generated from the model, but also on \textit{unlabeled} data originating from any source, including real data. To achieve this, we leverage Bayesian self-consistency properties that can be transformed into strictly proper losses that do not require knowledge of ground-truth…
Peer Reviews
Decision·ICLR 2026 Poster
The independence of true parameter values is the most striking feature of the proposed method. The method is justified by theories. Various numerical evidences including through simulation and real-data applications are strong and convincing.
The conditions of propositions 2 and 3 are not clear to me (See question below). Figure 4 (b) is a little bit misleading. Overall, these are minor defects.
- The writing and proposed methodology are clear and easy to understand - The proposed regularizer is intuitive: as the true posterior demonstrates self-consistency property, it’s a natural extension for the variational posterior to satisfy this condition (approximately) - The problem setting is a significant one, as distribution shift in simulation-based inference (SBI) problems continues to be a robust area of research. - The others take care to formalize their result more carefully with resp
- The method is combinatorial: the NPE (score-based) objective for SBI is well-established, and Schmitt et al. (2024) introduced the self-consistency loss. - I don’t find the degree of novelty to be adequate enough to differentiate the work from Schmitt et al. In the case where the simulation model is correct, the proposed method is exactly identical to Schmitt et al. Thus, the authors’ main contribution, to me, seems to be applying the method to the setting of a misspecified simulator. While th
Improving robustness of SBI to data distribution shifts in an important problem.
- The abstract uses "bad pre-asymptotic behavior" without definition, making the core problem inaccessible. - When would practitioners have unlabeled real data but not be able to generate more simulations? - The paper claims "high-dimensional" capability but experiments max out at 100 parameters with significant performance degradation (Figure 2a shows MMD increasing substantially). The MNIST example (784D) is modest by modern standards. Please scale back claims. - Head-to-head comparisons on
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Bayesian Modeling and Causal Inference
