SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation
Zhenyuan Qin, Xincheng Shuai, Henghui Ding

TL;DR
SceneDesigner is a novel method for precise multi-object 9-DoF pose control in images, combining a new pose representation, dataset, training strategy, and inference techniques to improve controllability and image quality.
Contribution
It introduces a new 9D pose representation (CNOCS map), a dataset, a two-stage training with reinforcement learning, and inference methods for enhanced multi-object pose manipulation.
Findings
Outperforms existing methods in controllability and quality.
Effective 9D pose manipulation in complex scenes.
Robust training and inference strategies for stable results.
Abstract
Controllable image generation has attracted increasing attention in recent years, enabling users to manipulate visual content such as identity and style. However, achieving simultaneous control over the 9D poses (location, size, and orientation) of multiple objects remains an open challenge. Despite recent progress, existing methods often suffer from limited controllability and degraded quality, falling short of comprehensive multi-object 9D pose control. To address these limitations, we propose SceneDesigner, a method for accurate and flexible multi-object 9-DoF pose manipulation. SceneDesigner incorporates a branched network to the pre-trained base model and leverages a new representation, CNOCS map, which encodes 9D pose information from the camera view. This representation exhibits strong geometric interpretation properties, leading to more efficient and stable training. To support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
