A Likelihood Based Approach to Distribution Regression Using Conditional Deep Generative Models
Shivam Kumar, Yun Yang, and Lizhen Lin

TL;DR
This paper provides a theoretical analysis of likelihood-based conditional deep generative models within distribution regression, showing they can effectively learn complex distributions in high-dimensional spaces by leveraging intrinsic data structure.
Contribution
It establishes convergence rates for a sieve MLE in distribution regression, explaining how deep generative models overcome the curse of dimensionality from a statistical perspective.
Findings
Convergence rates depend on intrinsic dimension and smoothness.
Models can learn nearly singular conditional distributions.
Adding small noise aids in learning near-manifold data.
Abstract
In this work, we explore the theoretical properties of conditional deep generative models under the statistical framework of distribution regression where the response variable lies in a high-dimensional ambient space but concentrates around a potentially lower-dimensional manifold. More specifically, we study the large-sample properties of a likelihood-based approach for estimating these models. Our results lead to the convergence rate of a sieve maximum likelihood estimator (MLE) for estimating the conditional distribution (and its devolved counterpart) of the response given predictors in the Hellinger (Wasserstein) metric. Our rates depend solely on the intrinsic dimension and smoothness of the true conditional distribution. These findings provide an explanation of why conditional deep generative models can circumvent the curse of dimensionality from the perspective of statistical…
Peer Reviews
Decision·Submitted to ICLR 2025
1. Very well written paper 2. Method is clearly explained and proof are followable
1. The method seems to be extending the techniques of previous work and filling in some details with approximation. 2. Question on the applicability to multiple chart manifolds. 3. Numerical examples are very simple.
The theoretical results appear to be correct and sound. Moreover, there are detailed analyses for different setups and matching lower bound. There are numerical results to support the performance of neural network generative models for distribution regression.
The paper is not very easy to follow in places, mostly due to the technical presentations and pure statistical discussions. I would suggest authors to prepare a preliminary section before mixing background introduction with main results in current Section 2. It is relatively difficult for me to judge the technical novelty of the paper. On the one hand, the distribution regression problem and low-dimensional generative neural networks seem to pose technical difficulties. On the other hand, the r
The proposed analysis is helpful in understanding why generative models work as they do. Additionally the proposed error analysis is useful for anticipating the number of data points needed to achieve a specific targeted error with respect to the Wasserstein or Hellinger distances. The empirical results support the main claims of the work to some extent.
Some of the theoretical results seem hard to verify in practice. It would be interesting for some contrived examples if the authors can show the convergence of the distances as a function of the number of points. The theorems contain some constants that do not have a clear way of being estimated making some of the errors hard to use in practice. Note that the paper does not appear to be in the right ICLR format since the margins are considerably smaller.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Bayesian Methods and Mixture Models · Generative Adversarial Networks and Image Synthesis
