Scalable Learning of Multivariate Distributions via Coresets
Zeyu Ding, Katja Ickstadt, Nadja Klein, Alexander Munteanu, Simon Omlor

TL;DR
This paper introduces a novel coreset construction method for multivariate conditional transformation models, significantly improving scalability and efficiency in large-scale density estimation and regression tasks while maintaining statistical accuracy.
Contribution
It presents the first coresets for semi-parametric distributional models, enhancing scalability and robustness in complex, large datasets with non-linear relationships.
Findings
Substantial data reduction via importance sampling.
Maintains log-likelihood within $(1\u00b1\u03b5)$ error bounds.
Improved computational efficiency demonstrated in experiments.
Abstract
Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis
