Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs

Minh Vu; Xiaoliang Wan; Shuangqing Wei

arXiv:2602.09277·stat.ML·February 11, 2026

Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs

Minh Vu, Xiaoliang Wan, Shuangqing Wei

PDF

Open Access

TL;DR

This paper reveals that in $eta$-VAEs, increasing regularization can cause a collapse in mutual information, leading to disentanglement failure, and proposes a modified model to prevent this collapse and improve disentanglement stability.

Contribution

The paper introduces the $eta ext{-} ext{VAE}$ analysis revealing mutual information collapse and proposes the $ ext{lambda}eta$-VAE with an auxiliary penalty to prevent this collapse.

Findings

01

Mutual information collapses at high $eta$, impairing disentanglement.

02

Theoretical proof of spectral contraction causing information loss.

03

Empirical results show $ ext{lambda}eta$-VAE stabilizes disentanglement across $eta$ values.

Abstract

The $β$ -VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β > 1$ , stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λ β$ -VAE, which decouples…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques