Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs
Minh Vu, Xiaoliang Wan, Shuangqing Wei

TL;DR
This paper reveals that in $eta$-VAEs, increasing regularization can cause a collapse in mutual information, leading to disentanglement failure, and proposes a modified model to prevent this collapse and improve disentanglement stability.
Contribution
The paper introduces the $eta ext{-} ext{VAE}$ analysis revealing mutual information collapse and proposes the $ ext{lambda}eta$-VAE with an auxiliary penalty to prevent this collapse.
Findings
Mutual information collapses at high $eta$, impairing disentanglement.
Theoretical proof of spectral contraction causing information loss.
Empirical results show $ ext{lambda}eta$-VAE stabilizes disentanglement across $eta$ values.
Abstract
The -VAE is a foundational framework for unsupervised disentanglement, using to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for , stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the -VAE, which decouples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
