Functional Distribution Networks (FDN)
Omer Haq

TL;DR
Functional Distribution Networks (FDN) improve uncertainty estimation in probabilistic regressors by modeling input-conditioned weight distributions, maintaining accuracy and calibration under distribution shifts.
Contribution
FDN introduces input-conditioned weight distributions trained with a Monte Carlo beta-ELBO, enhancing shift-aware uncertainty estimation in regression models.
Findings
FDN achieves competitive accuracy with Bayesian and ensemble methods.
FDN provides input-dependent, shift-aware uncertainty estimates.
FDN maintains calibration under distribution shifts.
Abstract
Modern probabilistic regressors often remain overconfident under distribution shift. Functional Distribution Networks (FDN) place input-conditioned distributions over network weights, producing predictive mixtures whose dispersion adapts to the input; we train them with a Monte Carlo beta-ELBO objective. We pair FDN with an evaluation protocol that separates interpolation from extrapolation and emphasizes simple OOD sanity checks. On controlled 1D tasks and small/medium UCI-style regression benchmarks, FDN remains competitive in accuracy with strong Bayesian, ensemble, dropout, and hypernetwork baselines, while providing strongly input-dependent, shift-aware uncertainty and competitive calibration under matched parameter and update budgets.
Peer Reviews
Decision·Submitted to ICLR 2026
- The idea of conditioning parameter distributions on inputs is interesting and well-motivated for handling OOD uncertainty. - The paper provides a clear conceptual positioning relative to related work. - The evaluation considers both ID/OOD performance and parameter efficiency.
- The experimental results do not convincingly show clear gains over strong baselines. - No real-world experiments are presented, limiting practical validation. - Figures and presentation could be clearer
- Quantifying regression predictive uncertainty in a calibrated manner is an unsolved research problem. The paper aims to tackle a highly relevant issue, of great interest to the field. - The split of the input space into ID and OOD is useful for assessing OOD detection capabilities. - The proposed method is mostly well explained. - The positioning of this work relative to related literature is made clear and relevant baselines are included.
- Figures 1 and 2 are quite hard to understand. This is due to font sizes, legend positioning, overlapping lines and (lack of) scaling of axes. Sometimes there are multiple lines of the same color (e.g. 1e) and it is not clear what they represent. It should be stated clearly in the caption that Figure 1 shows results on both ID and OOD input locations. - The quantities used to compare methods (CRPS, AURS, ...) should be mathematically defined, including how they are computed on finite data, at
- the paper is relatively well written - the idea is novel even though I am very sceptical that the approach can be scaled to anything more complex than the presented 1D tasks.
- the experiments are way too simple, only presenting tiny networks for 1-D regression. the 1-D regression tasks are not even randomized but presented for specific functions, so it is hard to say whether the results are just artifacts of the 3 functions that have been presented. The authors should look into the neural process literature on what family of functions are used there. If we would have the average performance over many instances of the same family, than it would at least be statistica
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Bayesian Modeling and Causal Inference
