Disentangled representations via score-based variational autoencoders
Benjamin S. H. Lyo, Eero P. Simoncelli, Cristina Savin

TL;DR
This paper introduces SAMI, a novel unsupervised learning method that combines diffusion models and VAEs to learn interpretable, structured representations from complex data, including images and videos.
Contribution
SAMI unifies diffusion models and VAEs into a single framework, enabling the extraction of meaningful, disentangled representations without supervision.
Findings
Recovers ground truth generative factors in synthetic data
Learns factorized, semantic latent dimensions from natural images
Encodes video sequences into straighter latent trajectories
Abstract
We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower bounds, SAMI formulates a principled objective that learns representations through score-based guidance of the underlying diffusion process. The resulting representations automatically capture meaningful structure in the data: it recovers ground truth generative factors in our synthetic dataset, learns factorized, semantic latent dimensions from complex natural images, and encodes video sequences into latent trajectories that are straighter than those of alternative encoders, despite training exclusively on static images. Furthermore, SAMI can extract useful representations from pre-trained diffusion models with minimal additional training. Finally, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Advanced Neuroimaging Techniques and Applications
