PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation

Qixuan Li; Chao Wang; Zongjin He; Yan Peng

arXiv:2502.00708·cs.CV·February 4, 2025

PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation

Qixuan Li, Chao Wang, Zongjin He, Yan Peng

PDF

Open Access

TL;DR

PhiP-G introduces a physics-guided framework for text-to-3D compositional scene generation that ensures physical plausibility, captures complex relationships, and enhances efficiency using world models and LLMs.

Contribution

It presents a novel integration of layout guidance, physical constraints, and LLM-based scene analysis for improved text-to-3D scene synthesis.

Findings

01

Achieves state-of-the-art CLIP scores in scene generation.

02

Matches leading methods in T$^3$Bench quality metrics.

03

Improves generation efficiency by 24 times.

Abstract

Text-to-3D asset generation has achieved significant optimization under the supervision of 2D diffusion priors. However, when dealing with compositional scenes, existing methods encounter several challenges: 1). failure to ensure that composite scene layouts comply with physical laws; 2). difficulty in accurately capturing the assets and relationships described in complex scene descriptions; 3). limited autonomous asset generation capabilities among layout approaches leveraging large language models (LLMs). To avoid these compromises, we propose a novel framework for compositional scene generation, PhiP-G, which seamlessly integrates generation techniques with layout guidance based on a world model. Leveraging LLM-based agents, PhiP-G analyzes the complex scene description to generate a scene graph, and integrating a multimodal 2D generation agent and a 3D Gaussian generation method for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsDiffusion · Contrastive Language-Image Pre-training