High-dimensional Asymptotics of VAEs: Threshold of Posterior Collapse and Dataset-Size Dependence of Rate-Distortion Curve

Yuma Ichikawa; Koji Hukushima

arXiv:2309.07663·stat.ML·July 22, 2025·2 cites

High-dimensional Asymptotics of VAEs: Threshold of Posterior Collapse and Dataset-Size Dependence of Rate-Distortion Curve

Yuma Ichikawa, Koji Hukushima

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper analyzes the conditions leading to posterior collapse in high-dimensional VAEs, revealing a threshold effect of the hyperparameter beta and the influence of dataset size on the rate-distortion trade-off.

Contribution

It provides a theoretical analysis of posterior collapse thresholds and dataset-size effects in high-dimensional VAEs, offering insights into their generalization behavior.

Findings

01

Posterior collapse occurs beyond a certain beta threshold, regardless of dataset size.

02

Large datasets are necessary to achieve high-rate rate-distortion curves.

03

The analysis explains observed behaviors in real-world non-linear VAEs.

Abstract

In variational autoencoders (VAEs), the variational posterior often collapses to the prior, known as posterior collapse, which leads to poor representation learning quality. An adjustable hyperparameter beta has been introduced in VAEs to address this issue. This study sharply evaluates the conditions under which the posterior collapse occurs with respect to beta and dataset size by analyzing a minimal VAE in a high-dimensional limit. Additionally, this setting enables the evaluation of the rate-distortion curve of the VAE. Our results show that, unlike typical regularization parameters, VAEs face "inevitable posterior collapse" beyond a certain beta threshold, regardless of dataset size. Moreover, the dataset-size dependence of the derived rate-distortion curve suggests that relatively large datasets are required to achieve a rate-distortion curve with high rates. These findings…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

- To the best of my knowledge, this is the first paper that studied RD curves in VAEs as a function of dataset size and data dimensions. This topic I think is a valuable topic of study and will indeed be of interest to the ICLR community. - The theory in the paper, to the best of my understanding, is sound. - The paper for the most reads well.

Weaknesses

- There is no study of the network capacity in this work. While I understand that this is theoretical work, the authors do make a claim that the same results hold for more complex networks. However, there are prior works that suggest that RD curves for different network capacities behave differently [1,2]. Could the authors comment on this? - It is also not clear to me what is the message of the paper. It ofcourse makes sense that when you don't have a lotta data in high dimensions, you want to

Reviewer 02Rating 8Confidence 3

Strengths

- The paper studied an important aspect of VAEs and how the different parameters and choices can affect the performance. - The empirical findings of the relation between generalisation error and the sample complexity as well as the beta parameter is interesting.

Weaknesses

The paper discussed a list of different behaviours of VAEs, but it feels like they are rather loosely connected findings (i.e., the subsections in Section 6). The findings themselves are interesting, but it is not surprising that changing one variable, such as beta or the number of training data, will lead to various changes in aspects like RD curves, posterior collapse. Therefore, I believe a more coherent story is important to connect the dots and make these findings more insightful.

Reviewer 03Rating 6Confidence 4

Strengths

**Technical strengths**: - The paper sharply characterizes high-dimensional asymptotics for learning the linear VAE (Eq (5)) under the spiked covariance model (Eq (4)) with the regularized $\beta$-VAE objective (Eq (6)). - This is used to show interesting observations about the VAE learning process in Section 6.1 and 6.2. In particular, (1) Figure 2 shows a double-descent phenomenon w.r.to the sample complexity $\alpha$, with the reconstruction error (Eq (9)) peaking at $\alpha = 1$, and (2) Fig

Weaknesses

**Technical Weaknesses**: - The main weakness is the fact that the theoretical results are not exact, since they have been developed using the replica method, which is a heuristic to get around intractable calculations. - The authors work in the simple setting of $k = k^\star = 1$. If I understand correctly, this means the true latent space is $1$-dimensional. It would have been nice to see the synthetic experiments with $k^\star$ varying, say in $[1, 2, 4]$. In particular, what would the trend

Code & Models

Repositories

yuma-ichikawa/vae-replica
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Cancer-related molecular mechanisms research