Variational Learning of Disentangled Representations
Yuli Slavutsky, Ozgur Beker, David Blei, Bianca Dumitrascu

TL;DR
DISCoVeR is a variational framework that effectively separates shared and condition-specific factors in data, improving disentanglement and generalization across diverse datasets without relying on handcrafted priors.
Contribution
The paper introduces DISCoVeR, a novel variational method with a dual-latent architecture and max-min objective for disentangling shared and specific factors without strong assumptions.
Findings
Achieves better disentanglement on synthetic datasets.
Improves representation quality on natural images.
Effective in single-cell RNA-seq data analysis.
Abstract
Disentangled representations enable models to separate factors of variation that are shared across experimental conditions from those that are condition-specific. This separation is essential in domains such as biomedical data analysis, where generalization to new treatments, patients, or species depends on isolating stable biological signals from context-dependent effects. While extensions of the variational autoencoder (VAE) framework have been proposed to address this problem, they frequently suffer from leakage between latent representations, limiting their ability to generalize to unseen conditions. Here, we introduce DISCoVeR, a new variational framework that explicitly separates condition-invariant and condition-specific factors. DISCoVeR integrates three key components: (i) a dual-latent architecture that models shared and specific factors separately; (ii) two parallel…
Peer Reviews
Decision·Submitted to ICLR 2026
(1) The research direction of this paper is promising. Formulating explicit theoretical formulations and constraints for different parts of the latent space is a reasonable approach to promote learning disentangled representations. (2) Overall, the paper is well-organized and easy to follow.
(W1) The descriptions of the proposed methods are not clear enough and may contain fatal errors. For example, first, the proof of B.1 for the key Proposition 2.1 does not look correct from (17) to (18). In (17), it appears that the -logp(z,w| x, y) is broken down into -logp(w|x, y) - logp(z|w, x, y), so this −logp(w|x, y) cancels the later +logp(w|x, y) in (17). However, why is $E_{q(w | x, y)}$ dropped when there remains a term logp(z|w, x, y) that still includes w inside the expectation? It wo
- The design of the optimization and DisCoVR is grounded in the theory of variance inference and probabilistic graphical models throughout, providing rigor to the problem formulation. - The idea of separating condition-invariance and -specific representations is interesting and has novelty. - The comparison with related baselines is done both analytically and experimentally, again providing rigor to the formulation of DisCoVR. - Experimentation considered a variety of datasets ranging from s
The decision to couple the prior of w and z is not very well justified or explained. It can be understood that doing so will require z to be informative, but the informativeness of z should already be encouraged by the reconstruction loss formulated on x from q(z|x). More importantly, it seems that it would create a conflict with the intended disentangling objective between z and w. The validity of this design should be better clarified theoretically/analytically, and ablated experimentally. Th
1. The theoretical part, including statistical derivations and optimization, is very solid. 2. The experimental part is also comprehensive and solid, with per-epoch runtime statistics for multiple baselines and full hyperparameter details; the information is thorough and should enable strong reproducibility.
1. lines 216-217 'maximizing this lower bound on I(z;y) also maximizes I(z;y)' do you mean minimize? And even it corrected as minizing, it is still not rigor to say 'minimizing the lower bound imply minimizing the value itself'. 2. line 211 says using 'logistic regression', but in appendix it seems you also use MLP for here. 3. In theoretical derivation, you require Qz, Qw to be convex and compact (Proposition 2.2 and standard regularity conditions). And in 193-195, you choose d-dimensional G
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Single-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning
