Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion   Model

Donghwna Lee; Kyungha Min; Kirok Kim; Seyoung Jeong; Jiwoo Jeong,; Wooju Kim

arXiv:2412.07333·cs.CV·December 11, 2024

Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model

Donghwna Lee, Kyungha Min, Kirok Kim, Seyoung Jeong, Jiwoo Jeong,, Wooju Kim

PDF

Open Access

TL;DR

This paper introduces FPDM, a two-stage fusion embedding approach using diffusion models for pose-guided person image synthesis, achieving state-of-the-art results on benchmark datasets.

Contribution

Proposes a novel two-stage fusion embedding method for PGPIS leveraging pre-trained CLIP models, simplifying the model structure and improving synthesis quality.

Findings

01

Achieves SOTA performance on DeepFashion and RWTH-PHOENIX datasets.

02

Even a simplified model with only the second stage performs competitively.

03

Demonstrates the effectiveness of fusion embedding in preserving appearance and pose accuracy.

Abstract

Pose-Guided Person Image Synthesis (PGPIS) aims to synthesize high-quality person images corresponding to target poses while preserving the appearance of the source image. Recently, PGPIS methods that use diffusion models have achieved competitive performance. Most approaches involve extracting representations of the target pose and source image and learning their relationships in the generative model's training process. This approach makes it difficult to learn the semantic relationships between the input and target images and complicates the model structure needed to enhance generation results. To address these issues, we propose Fusion embedding for PGPIS using a Diffusion Model (FPDM). Inspired by the successful application of pre-trained CLIP models in text-to-image diffusion models, our method consists of two stages. The first stage involves training the fusion embedding of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Advanced Image and Video Retrieval Techniques

MethodsALIGN · Diffusion · Contrastive Language-Image Pre-training