PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
Yian Wang, Han Yang, Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B. Tenenbaum, Chuang Gan

TL;DR
PhyScensis is a physics-augmented language model framework that generates complex, physically plausible 3D scenes for robotic simulation, integrating an LLM, physics engine, and feedback loop for high realism and controllability.
Contribution
The paper introduces PhyScensis, a novel LLM-based framework that combines language understanding with physics simulation to produce complex, realistic physical scenes for robotics.
Findings
Outperforms prior methods in scene complexity and physical accuracy
Generates scenes with high object density and rich physical relationships
Provides controllable scene generation through probabilistic programming
Abstract
Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine,…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is well written and easy to follow. 2. The generation results look good. 3. The proposed method enables a certain level of controllability, such as the distance between objects and the stability of objects.
1. I think the term scene generation used here is misleading. The paper mostly focuses on “object arrangement” [1] or “layout generation” [2], where the goal is to place objects of similar sizes on a given surface (e.g., a bookshelf or a table). This is implied by all the qualitative examples. In contrast, scene generation usually refers to generating larger and more complex indoor scenes containing objects of various sizes and more diverse object relationships, which is not demonstrated in the
* The overall system, which integrates LLM-based predicates, a physics-based solver, a geometry-based spatial solver, and feedback to the LLM, is well-designed. This results in layouts that are both reasonable and physically stable. * Physics-plausible scene generation is an interesting and important direction, particularly for large-scale scene generation. * The experiments are thorough, including ablations and additional evaluations on downstream robotics tasks.
* It is unclear what text prompts are used in the test set for all methods. How many prompts are there, and how diverse are they? * There is no discussion of failure cases, particularly regarding physics. What are the limitations of the current predefined predicates? * Regarding the LLM, it is unclear how it determines object sizes and how it selects objects from the candidate object set.
- The paper is very well written, well motivated, and easy to follow - The methodology, although not entirely novel, is promising. - The results and ablations show that the individual design choices result in improved generation speed and scene quality.
### The main weakness of the paper is the experiments. In particular, the downstream experiment fails to showcase the advantages of the approach compared to existing scene generation pipelines in the robotics domain: - The VQA-based evaluation is questionable. It’s not clear if this metric works well for complex 3D tabletop environments. The high variance across models suggests it may not be reliable. Comparing it with human judgments could help validate this. - It’s unclear whether the same VQA
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Computer Graphics and Visualization Techniques
