Scene Aware Person Image Generation through Global Contextual   Conditioning

Prasun Roy; Subhankar Ghosh; Saumik Bhattacharya; Umapada Pal; Michael; Blumenstein

arXiv:2206.02717·cs.CV·February 19, 2025

Scene Aware Person Image Generation through Global Contextual Conditioning

Prasun Roy, Subhankar Ghosh, Saumik Bhattacharya, Umapada Pal, Michael, Blumenstein

PDF

Open Access

TL;DR

This paper introduces a novel pipeline for generating and inserting contextually relevant person images into existing scenes, ensuring seamless blending with scene semantics and other persons, using a sequence of specialized neural networks.

Contribution

The work presents a new multi-network approach for scene-aware person image generation that maintains scene context and improves structural accuracy of inserted persons.

Findings

01

Achieves high-resolution, photo-realistic person insertion results.

02

Preserves scene semantics and interactions with existing persons.

03

Outperforms baseline methods in qualitative and quantitative benchmarks.

Abstract

Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis