Stabilizing the Kumaraswamy Distribution
Max Wasserman, Gonzalo Mateos

TL;DR
This paper addresses numerical stability issues in the Kumaraswamy distribution, enhancing its usability in scalable variational models for bounded latent variables across various applications.
Contribution
We identify and fix numerical instabilities in the Kumaraswamy distribution's inverse CDF and log-pdf, enabling more reliable use in large-scale latent variable models.
Findings
Improved exploration-exploitation in multi-armed bandits
Enhanced uncertainty quantification in graph neural networks
Stabilized KS distribution as a core component in variational models
Abstract
Large-scale latent variable models require expressive continuous distributions that support efficient sampling and low-variance differentiation, achievable through the reparameterization trick. The Kumaraswamy (KS) distribution is both expressive and supports the reparameterization trick with a simple closed-form inverse CDF. Yet, its adoption remains limited. We identify and resolve numerical instabilities in the inverse CDF and log-pdf, exposing issues in libraries like PyTorch and TensorFlow. We then introduce simple and scalable latent variable models based on the KS, improving exploration-exploitation trade-offs in contextual multi-armed bandits and enhancing uncertainty quantification for link prediction with graph neural networks. Our results support the stabilized KS distribution as a core component in scalable variational models for bounded latent variables.
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The paper's strengths lie in its originality in addressing numerical instabilities in the KS distribution. While the idea is simple, resolving these issues unlocks the KS distribution's potential in applications like contextual multi-armed bandits and graph neural networks, enhancing exploration and uncertainty quantification. The work is of high quality, with validation and clear exposition, making it an impactful contribution to bounded latent variable modeling. Although the solution is quite
The key weakness of the paper is the lack of direct evidence quantifying the impact of numerical instability on performance outcomes in the experiments. While the authors resolve known instabilities in the KS distribution, it is unclear how these instabilities previously affected results or how the stabilization improves them. I see the empirical resutls,but I do not understand how.
Due to capturing stability during numerical computation, the Kumarswamy can be better utilized in various tasks in the deep learning community.
While the proposed method enlarges the utilization of the Kumaraswamy, the impact and novelty of the work are limited to a single distribution. The modified Kumaraswamy should be properly compared with the current Kumaraswamy in the experiments.
* The stabilisation trick is very simple and the primitives (`expm1()` and `log1p()`) are already present in various frameworks. This has the potential of rapid adoption in practical settings. * Apart from some specific parts on the VBE section, the paper is well presented and written. The authors clearly demonstrate the numerical instabilities, how they manifest and how the proposed modifications can avoid them. * The tasks that the authors use to evaluate the behaviour of the KS distribution
* The novelty of this work is a bit limited. As far as the stabilisation is concerned, while the idea of numerically stable computation is important, the numerical stability is ensured by techniques proposed in the prior work of Mächler. Therefore, in my opinion, the main new insight and novelty of this work is in the benchmarking and the various tasks that the authors considered to test the viability of the KS distribution against other baselines. * Given the aforementioned, I will thus put mo
The analysis of identifying the instability is interesting enough. Also, the modification of the distribution makes sense a lot.
- However, given that large-scale problems are more related with their network capacity and the data, rather than the distribution modeling itself, I doubt if this modification could be benefit to the large-scale training. For example, let's focus on Section 4.1 with Image VAE. Could the authors provide additional argument why do we need this algorithm given that the diffusion model is all based on the normal distribution? I'm more like, why do we need this algorithm given that there is already
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Fractional Differential Equations Solutions · Fuzzy Systems and Optimization
