One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild

Dongqi Fan; Tao Chen; Mingjie Wang; Rui Ma; Qiang Tang; Zili Yi; Qian; Wang; Liang Chang

arXiv:2409.09593·cs.CV·September 17, 2024

One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild

Dongqi Fan, Tao Chen, Mingjie Wang, Rui Ma, Qiang Tang, Zili Yi, Qian, Wang, Liang Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces OnePoseTrans, a method for pose-guided person image synthesis that uses test-time fine-tuning and a Visual Consistency Module to generate high-quality images from a single source, improving stability and generalization in wild scenarios.

Contribution

We propose OnePoseTrans, a novel approach combining test-time fine-tuning with a Visual Consistency Module to enhance pose transfer quality from a single image in wild conditions.

Findings

01

Achieves high-quality pose transfer with only one source image.

02

Customizes a model in approximately 48 seconds per test case.

03

Outperforms state-of-the-art methods in stability and generalization.

Abstract

Current Pose-Guided Person Image Synthesis (PGPIS) methods depend heavily on large amounts of labeled triplet data to train the generator in a supervised manner. However, they often falter when applied to in-the-wild samples, primarily due to the distribution gap between the training datasets and real-world test samples. While some researchers aim to enhance model generalizability through sophisticated training procedures, advanced architectures, or by creating more diverse datasets, we adopt the test-time fine-tuning paradigm to customize a pre-trained Text2Image (T2I) model. However, naively applying test-time tuning results in inconsistencies in facial identities and appearance attributes. To address this, we introduce a Visual Consistency Module (VCM), which enhances appearance consistency by combining the face, text, and image embedding. Our approach, named OnePoseTrans, requires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Dongqi-Fan/OnePoseTrans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging