Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image   Diffusion Models

Ziyi Wu; Yulia Rubanova; Rishabh Kabra; Drew A. Hudson; Igor; Gilitschenski; Yusuf Aytar; Sjoerd van Steenkiste; Kelsey R. Allen; Thomas; Kipf

arXiv:2406.09292·cs.CV·October 30, 2024

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor, Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas, Kipf

PDF

Open Access 1 Video

TL;DR

This paper introduces Neural Assets, a method for controlling multi-object 3D poses in image diffusion models by using object-specific representations derived from reference images, enabling detailed scene editing and transfer.

Contribution

The authors propose Neural Assets, a novel approach that integrates object-specific visual and pose features into diffusion models for enhanced multi-object 3D scene synthesis.

Findings

01

Achieves state-of-the-art multi-object editing results

02

Enables fine-grained control of object poses and placements

03

Demonstrates transferability and recomposition of Neural Assets across scenes

Abstract

We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are trained to reconstruct the respective objects in a different image, e.g., a later frame in the video. Importantly, we encode object visuals from the reference image while conditioning on object poses from the target frame. This enables learning disentangled appearance and pose features. Combining visual and 3D pose representations in a sequence-of-tokens format allows us to keep the text-to-image architecture of existing models, with Neural Assets in place of text tokens. By fine-tuning a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques

MethodsSparse Evolutionary Training · Diffusion