Variational Learning of Disentangled Representations

Yuli Slavutsky; Ozgur Beker; David Blei; Bianca Dumitrascu

arXiv:2506.17182·cs.LG·December 16, 2025

Variational Learning of Disentangled Representations

Yuli Slavutsky, Ozgur Beker, David Blei, Bianca Dumitrascu

PDF

Open Access 3 Reviews

TL;DR

DISCoVeR is a variational framework that effectively separates shared and condition-specific factors in data, improving disentanglement and generalization across diverse datasets without relying on handcrafted priors.

Contribution

The paper introduces DISCoVeR, a novel variational method with a dual-latent architecture and max-min objective for disentangling shared and specific factors without strong assumptions.

Findings

01

Achieves better disentanglement on synthetic datasets.

02

Improves representation quality on natural images.

03

Effective in single-cell RNA-seq data analysis.

Abstract

Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions. Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors. DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

(1) The research direction of this paper is promising. Formulating explicit theoretical formulations and constraints for different parts of the latent space is a reasonable approach to promote learning disentangled representations. (2) Overall, the paper is well-organized and easy to follow.

Weaknesses

(W1) The descriptions of the proposed methods are not clear enough and may contain fatal errors. For example, first, the proof of B.1 for the key Proposition 2.1 does not look correct from (17) to (18). In (17), it appears that the -logp(z,w| x, y) is broken down into -logp(w|x, y) - logp(z|w, x, y), so this −logp(w|x, y) cancels the later +logp(w|x, y) in (17). However, why is $E_{q(w | x, y)}$ dropped when there remains a term logp(z|w, x, y) that still includes w inside the expectation? It wo

Reviewer 02Rating 4Confidence 4

Strengths

- The design of the optimization and DisCoVR is grounded in the theory of variance inference and probabilistic graphical models throughout, providing rigor to the problem formulation. - The idea of separating condition-invariance and -specific representations is interesting and has novelty. - The comparison with related baselines is done both analytically and experimentally, again providing rigor to the formulation of DisCoVR. - Experimentation considered a variety of datasets ranging from s

Weaknesses

The decision to couple the prior of w and z is not very well justified or explained. It can be understood that doing so will require z to be informative, but the informativeness of z should already be encouraged by the reconstruction loss formulated on x from q(z|x). More importantly, it seems that it would create a conflict with the intended disentangling objective between z and w. The validity of this design should be better clarified theoretically/analytically, and ablated experimentally. Th

Reviewer 03Rating 6Confidence 3

Strengths

1. The theoretical part, including statistical derivations and optimization, is very solid. 2. The experimental part is also comprehensive and solid, with per-epoch runtime statistics for multiple baselines and full hyperparameter details; the information is thorough and should enable strong reproducibility.

Weaknesses

1. lines 216-217 'maximizing this lower bound on I(z;y) also maximizes I(z;y)' do you mean minimize? And even it corrected as minizing, it is still not rigor to say 'minimizing the lower bound imply minimizing the value itself'. 2. line 211 says using 'logistic regression', but in appendix it seems you also use MLP for here. 3. In theoretical derivation, you require Qz, Qw to be convex and compact (Proposition 2.2 and standard regularity conditions). And in 193-195, you choose d-dimensional G

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Single-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning