Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects
Kyungmin Jo, Jaegul Choo

TL;DR
This paper introduces Skip-and-Play (SnP), a novel depth-driven pose control method for image generation that improves controllability across diverse objects and poses by mitigating shape dependency issues.
Contribution
The paper proposes a new depth-based pose control technique with SnP, enabling pose-preserved image generation for various objects and poses, overcoming limitations of existing methods.
Findings
SnP outperforms baseline models in pose accuracy and diversity.
SnP can generate images with different objects and prompts, demonstrating robustness.
Depth-based control with SnP enhances controllability in diffusion models.
Abstract
The emergence of diffusion models has enabled the generation of diverse high-quality images solely from text, prompting subsequent efforts to enhance the controllability of these models. Despite the improvement in controllability, pose control remains limited to specific objects (e.g., humans) or poses (e.g., frontal view) due to the fact that pose is generally controlled via camera parameters (e.g., rotation angle) or keypoints (e.g., eyes, nose). Specifically, camera parameters-conditional pose control models generate unrealistic images depending on the object, owing to the small size of 3D datasets for training. Also, keypoint-based approaches encounter challenges in acquiring reliable keypoints for various objects (e.g., church) or poses (e.g., back view). To address these limitations, we propose depth-based pose control, as depth maps are easily obtainable from a single depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Augmented Reality Applications
MethodsDiffusion
