The continuous categorical: a novel simplex-valued exponential family
Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John P. Cunningham

TL;DR
This paper introduces the continuous categorical, a new exponential family distribution for simplex-valued data, addressing limitations of Dirichlet models by providing unbiased estimators and better numerical stability, with demonstrated empirical improvements.
Contribution
The paper proposes the continuous categorical distribution, a novel exponential family for modeling simplex data, overcoming biases and numerical issues of traditional models like the Dirichlet.
Findings
Outperforms Dirichlet in simulations and real-world tasks
Provides unbiased estimators and reparameterization-friendly sampling methods
Demonstrates improved performance in neural network compression and election data analysis
Abstract
Simplex-valued data appear throughout statistics and machine learning, for example in the context of transfer learning and compression of deep networks. Existing models for this class of data rely on the Dirichlet distribution or other related loss functions; here we show these standard choices suffer systematically from a number of limitations, including bias and numerical issues that frustrate the use of flexible network models upstream of these distributions. We resolve these limitations by introducing a novel exponential family of distributions for modeling simplex-valued data - the continuous categorical, which arises as a nontrivial multivariate generalization of the recently discovered continuous Bernoulli. Unlike the Dirichlet and other typical choices, the continuous categorical results in a well-behaved probabilistic loss function that produces unbiased estimators, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Markov Chains and Monte Carlo Methods
