One-Shot Learning for Pose-Guided Person Image Synthesis in the Wild
Dongqi Fan, Tao Chen, Mingjie Wang, Rui Ma, Qiang Tang, Zili Yi, Qian, Wang, Liang Chang

TL;DR
This paper introduces OnePoseTrans, a method for pose-guided person image synthesis that uses test-time fine-tuning and a Visual Consistency Module to generate high-quality images from a single source, improving stability and generalization in wild scenarios.
Contribution
We propose OnePoseTrans, a novel approach combining test-time fine-tuning with a Visual Consistency Module to enhance pose transfer quality from a single image in wild conditions.
Findings
Achieves high-quality pose transfer with only one source image.
Customizes a model in approximately 48 seconds per test case.
Outperforms state-of-the-art methods in stability and generalization.
Abstract
Current Pose-Guided Person Image Synthesis (PGPIS) methods depend heavily on large amounts of labeled triplet data to train the generator in a supervised manner. However, they often falter when applied to in-the-wild samples, primarily due to the distribution gap between the training datasets and real-world test samples. While some researchers aim to enhance model generalizability through sophisticated training procedures, advanced architectures, or by creating more diverse datasets, we adopt the test-time fine-tuning paradigm to customize a pre-trained Text2Image (T2I) model. However, naively applying test-time tuning results in inconsistencies in facial identities and appearance attributes. To address this, we introduce a Visual Consistency Module (VCM), which enhances appearance consistency by combining the face, text, and image embedding. Our approach, named OnePoseTrans, requires…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging
