Text2Place: Affordance-aware Text Guided Human Placement
Rishubh Parihar, Harsh Gupta, Sachidanand VS, R. Venkatesh Babu

TL;DR
This paper introduces Text2Place, a novel method for realistic human placement in scenes guided by text, which learns semantic masks and performs subject-conditioned inpainting without large-scale training, enabling diverse and realistic scene compositions.
Contribution
The work presents the first effective solution for realistic human placement in diverse scenes using text-guided semantic masks and inpainting, eliminating the need for extensive training.
Findings
Achieves highly realistic scene compositions preserving background and subject identity.
Outperforms strong baselines in realistic human placement tasks.
Enables downstream applications like scene hallucination and attribute editing.
Abstract
For a given scene, humans can easily reason for the locations and pose to place objects. Designing a computational model to reason about these affordances poses a significant challenge, mirroring the intuitive reasoning abilities of humans. This work tackles the problem of realistic human insertion in a given background scene termed as \textbf{Semantic Human Placement}. This task is extremely challenging given the diverse backgrounds, scale, and pose of the generated person and, finally, the identity preservation of the person. We divide the problem into the following two stages \textbf{i)} learning \textit{semantic masks} using text guidance for localizing regions in the image to place humans and \textbf{ii)} subject-conditioned inpainting to place a given subject adhering to the scene affordance within the \textit{semantic masks}. For learning semantic masks, we leverage rich…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsInpainting
