Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models

Archer Wang; Emile Anand; Yilun Du; Marin Solja\v{c}i\'c

arXiv:2601.22057·cs.CV·March 19, 2026

Unsupervised Decomposition and Recombination with Discriminator-Driven Diffusion Models

Archer Wang, Emile Anand, Yilun Du, Marin Solja\v{c}i\'c

PDF

Open Access

TL;DR

This paper introduces a discriminator-driven diffusion model that learns factorized latent representations without supervision, enabling improved compositional generation and diverse robotic video synthesis.

Contribution

It proposes an adversarial training approach to enhance latent factor discovery and recombination quality in diffusion models without requiring factor-level labels.

Findings

01

Outperforms prior methods on CelebA-HQ, Virtual KITTI, CLEVR, and Falcor3D

02

Achieves lower FID scores and better disentanglement metrics (MIG, MCC)

03

Generates diverse robotic trajectories increasing exploration coverage

Abstract

Decomposing complex data into factorized representations can reveal reusable components and enable synthesizing new samples via component recombination. We investigate this in the context of diffusion-based models that learn factorized latent spaces without factor-level supervision. In images, factors can capture background, illumination, and object attributes; in robotic videos, they can capture reusable motion components. To improve both latent factor discovery and quality of compositional generation, we introduce an adversarial training signal via a discriminator trained to distinguish between single-source samples and those generated by recombining factors across sources. By optimizing the generator to fool this discriminator, we encourage physical and semantic consistency in the resulting recombinations. Our method outperforms implementations of prior baselines on CelebA-HQ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis