DRDM: A Disentangled Representations Diffusion Model for Synthesizing   Realistic Person Images

Enbo Huang; Yuan Zhang; Faliang Huang; Guangyu Zhang; Yang Liu

arXiv:2412.18797·cs.CV·December 30, 2024

DRDM: A Disentangled Representations Diffusion Model for Synthesizing Realistic Person Images

Enbo Huang, Yuan Zhang, Faliang Huang, Guangyu Zhang, Yang Liu

PDF

Open Access

TL;DR

This paper introduces DRDM, a diffusion model that generates realistic person images with controllable poses and appearances by disentangling features and guiding the synthesis process, improving detail preservation and reducing distortions.

Contribution

The paper proposes a novel disentangled representations diffusion model with a body-part decoupling block and a parsing map-based guided sampling method for improved person image synthesis.

Findings

01

Achieves high-quality pose transfer and appearance control.

02

Reduces limb distortion and garment style deviation.

03

Demonstrates effectiveness on the Deepfashion dataset.

Abstract

Person image synthesis with controllable body poses and appearances is an essential task owing to the practical needs in the context of virtual try-on, image editing and video production. However, existing methods face significant challenges with details missing, limbs distortion and the garment style deviation. To address these issues, we propose a Disentangled Representations Diffusion Model (DRDM) to generate photo-realistic images from source portraits in specific desired poses and appearances. First, a pose encoder is responsible for encoding pose features into a high-dimensional space to guide the generation of person images. Second, a body-part subspace decoupling block (BSDB) disentangles features from the different body parts of a source figure and feeds them to the various layers of the noise prediction block, thereby supplying the network with rich disentangled features for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Digital Media Forensic Detection

MethodsDiffusion