Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models
Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor, Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas, Kipf

TL;DR
This paper introduces Neural Assets, a method for controlling multi-object 3D poses in image diffusion models by using object-specific representations derived from reference images, enabling detailed scene editing and transfer.
Contribution
The authors propose Neural Assets, a novel approach that integrates object-specific visual and pose features into diffusion models for enhanced multi-object 3D scene synthesis.
Findings
Achieves state-of-the-art multi-object editing results
Enables fine-grained control of object poses and placements
Demonstrates transferability and recomposition of Neural Assets across scenes
Abstract
We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are trained to reconstruct the respective objects in a different image, e.g., a later frame in the video. Importantly, we encode object visuals from the reference image while conditioning on object poses from the target frame. This enables learning disentangled appearance and pose features. Combining visual and 3D pose representations in a sequence-of-tokens format allows us to keep the text-to-image architecture of existing models, with Neural Assets in place of text tokens. By fine-tuning a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsSparse Evolutionary Training · Diffusion
