SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass
Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie

TL;DR
SceneGen is a novel framework that generates multiple 3D assets from a single scene image in one pass, without extra optimization, advancing 3D content creation for VR/AR and AI applications.
Contribution
It introduces SceneGen, a feedforward model that synthesizes 3D assets with geometry and texture from scene images and object masks, with a new feature aggregation module.
Findings
Operates without extra optimization or retrieval.
Improves performance with multi-image input.
Produces high-quality 3D assets efficiently.
Abstract
3D content generation has recently attracted significant research interest, driven by its critical applications in VR/AR and embodied AI. In this work, we tackle the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for extra optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
