Subtractive Mixture Models via Squaring: Representation and Learning
Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp,, Arno Solin, Nicolas Gillis, Antonio Vergari

TL;DR
This paper introduces a novel approach to mixture models using subtraction via squaring within probabilistic circuits, significantly enhancing expressiveness and reducing component complexity in modeling complex distributions.
Contribution
It proposes a new framework for deep subtractive mixtures using squared probabilistic circuits, with theoretical and empirical evidence of increased expressiveness.
Findings
Squared circuits are exponentially more expressive than additive mixtures.
Subtractive mixtures require fewer components to model complex distributions.
Empirical results demonstrate improved distribution estimation on real-world tasks.
Abstract
Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.
Peer Reviews
Decision·ICLR 2024 spotlight
The paper is reasonably clear and proposes simple yet interesting idea which appears to work well on selected synthetic/small scale experiments. The authors provide the code for the experiments (which I did not reviewed). The figure in page 1 nicely summarises the benefit of relaxing the requirement of positive components. Overall the figures in the paper help to understand the introduced concepts. I think the paper is an interesting read.
The clarity of the paper in pages 4,5,6 could be improved, the presentation is very dense and discusses multiple threads. The paper would benefit from focusing on core ideas and describing them in more detail while the less important parts could be moved to the appendix. I have concerns that a few points in the paper are overselling the method (i.e the result in Fig 5. on test data appears very small if statistically significant at all but using ^2 introduces additional computational cost). I w
- [S1]: Originality -- as far as I can judge it's a novel and very interesting - [S2]: Significance -- seems to often work better than mixture models and other alternatives such as flows - [S3]: Clarity -- While the part related to tensor computations is a bit dense and could benefit from a more higher-level treatise, the paper is clearly written
- [W1]: Missing discussion / limitations: Maybe I overlooked this, but I could not find an actual discussion about the restriction of the approach, e.g., + what are the limitations of the approach? + how restrictive is the induced functional form by using squared functions? + how expressive is the approach in the shallow or small-K setting? (Fig. 5 e.g. indicates that $\pm$ is worse for small $K$) + is it possible to extend the approach to a conditional setup? - [W2]: Experiment
- Simple and effective idea - Empirical results show better performance than baseline on some tasks
- Paper's motivation can be stronger. e.g. add a real world motivating example. It would be interesting to see how the better density estimation can be used for an improved downstream task as well. - The NPCs use fewer parameters but in a more complex way. What is the impact of this on training cost. This question is not explored empirically.
Code & Models
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models
