Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

Chengyu Xie; Zhi Gong; Junchi Ren; Linkun Yu; Si Shen; Fei Shen; Xiaoyu Du

arXiv:2511.15092·cs.CV·November 20, 2025

Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

Chengyu Xie, Zhi Gong, Junchi Ren, Linkun Yu, Si Shen, Fei Shen, Xiaoyu Du

PDF

Open Access

TL;DR

This paper introduces JCDM, a diffusion-based framework that leverages multi-view priors and cross-view cues to improve pose-guided person image synthesis, achieving high fidelity and consistency across views.

Contribution

The paper proposes a novel jointly conditioned diffusion model that effectively fuses multi-view information for improved person image synthesis.

Findings

01

Achieves state-of-the-art fidelity in image generation.

02

Ensures consistent appearance across multiple views.

03

Supports variable numbers of reference views.

Abstract

Pose-guided human image generation is limited by incomplete textures from single reference views and the absence of explicit cross-view interaction. We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors. The appearance prior module (APM) infers a holistic identity preserving prior from incomplete references, and the joint conditional injection (JCI) mechanism fuses multi-view cues and injects shared conditioning into the denoising backbone to align identity, color, and texture across poses. JCDM supports a variable number of reference views and integrates with standard diffusion backbones with minimal and targeted architectural modifications. Experiments demonstrate state of the art fidelity and cross-view consistency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Face recognition and analysis