Real-Time Person Image Synthesis Using a Flow Matching Model

Jiwoo Jeong; Kirok Kim; Wooju Kim; Nam-Joon Kim

arXiv:2505.03562·cs.CV·May 7, 2025

Real-Time Person Image Synthesis Using a Flow Matching Model

Jiwoo Jeong, Kirok Kim, Wooju Kim, Nam-Joon Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flow matching-based generative model that significantly improves the speed of person image synthesis conditioned on pose, enabling near-real-time performance while maintaining high image quality.

Contribution

The proposed flow matching model offers a faster, more stable, and efficient alternative to diffusion methods for pose-guided person image synthesis, supporting real-time applications.

Findings

01

Achieves near-real-time sampling speeds on DeepFashion dataset.

02

Maintains performance comparable to state-of-the-art models.

03

Trades slight accuracy decrease for over twofold speed increase.

Abstract

Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image. This task plays a key role in various real-world applications, such as sign language video generation, AR/VR, gaming, and live streaming. In these scenarios, real-time PGPIS is critical for providing immediate visual feedback and maintaining user immersion.However, achieving real-time performance remains a significant challenge due to the complexity of synthesizing high-fidelity images from diverse and dynamic human poses. Recent diffusion-based methods have shown impressive image quality in PGPIS, but their slow sampling speeds hinder deployment in time-sensitive applications. This latency is particularly problematic in tasks like generating sign language videos during live broadcasts, where rapid image updates are required. Therefore, developing a fast and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sonny2020-c/rpfm-official-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings