PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

Yian Wang; Han Yang; Minghao Guo; Xiaowen Qiu; Tsun-Hsuan Wang; Wojciech Matusik; Joshua B. Tenenbaum; Chuang Gan

arXiv:2602.14968·cs.RO·February 17, 2026

PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement

Yian Wang, Han Yang, Minghao Guo, Xiaowen Qiu, Tsun-Hsuan Wang, Wojciech Matusik, Joshua B. Tenenbaum, Chuang Gan

PDF

Open Access 3 Reviews

TL;DR

PhyScensis is a physics-augmented language model framework that generates complex, physically plausible 3D scenes for robotic simulation, integrating an LLM, physics engine, and feedback loop for high realism and controllability.

Contribution

The paper introduces PhyScensis, a novel LLM-based framework that combines language understanding with physics simulation to produce complex, realistic physical scenes for robotics.

Findings

01

Outperforms prior methods in scene complexity and physical accuracy

02

Generates scenes with high object density and rich physical relationships

03

Provides controllable scene generation through probabilistic programming

Abstract

Automatically generating interactive 3D environments is crucial for scaling up robotic data collection in simulation. While prior work has primarily focused on 3D asset placement, it often overlooks the physical relationships between objects (e.g., contact, support, balance, and containment), which are essential for creating complex and realistic manipulation scenarios such as tabletop arrangements, shelf organization, or box packing. Compared to classical 3D layout generation, producing complex physical scenes introduces additional challenges: (a) higher object density and complexity (e.g., a small shelf may hold dozens of books), (b) richer supporting relationships and compact spatial layouts, and (c) the need to accurately model both spatial placement and physical properties. To address these challenges, we propose PhyScensis, an LLM agent-based framework powered by a physics engine,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper is well written and easy to follow. 2. The generation results look good. 3. The proposed method enables a certain level of controllability, such as the distance between objects and the stability of objects.

Weaknesses

1. I think the term scene generation used here is misleading. The paper mostly focuses on “object arrangement” [1] or “layout generation” [2], where the goal is to place objects of similar sizes on a given surface (e.g., a bookshelf or a table). This is implied by all the qualitative examples. In contrast, scene generation usually refers to generating larger and more complex indoor scenes containing objects of various sizes and more diverse object relationships, which is not demonstrated in the

Reviewer 02Rating 6Confidence 4

Strengths

* The overall system, which integrates LLM-based predicates, a physics-based solver, a geometry-based spatial solver, and feedback to the LLM, is well-designed. This results in layouts that are both reasonable and physically stable. * Physics-plausible scene generation is an interesting and important direction, particularly for large-scale scene generation. * The experiments are thorough, including ablations and additional evaluations on downstream robotics tasks.

Weaknesses

* It is unclear what text prompts are used in the test set for all methods. How many prompts are there, and how diverse are they? * There is no discussion of failure cases, particularly regarding physics. What are the limitations of the current predefined predicates? * Regarding the LLM, it is unclear how it determines object sizes and how it selects objects from the candidate object set.

Reviewer 03Rating 4Confidence 3

Strengths

- The paper is very well written, well motivated, and easy to follow - The methodology, although not entirely novel, is promising. - The results and ablations show that the individual design choices result in improved generation speed and scene quality.

Weaknesses

### The main weakness of the paper is the experiments. In particular, the downstream experiment fails to showcase the advantages of the approach compared to existing scene generation pipelines in the robotics domain: - The VQA-based evaluation is questionable. It’s not clear if this metric works well for complex 3D tabletop environments. The high variance across models suggests it may not be reliable. Comparing it with human judgments could help validate this. - It’s unclear whether the same VQA

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Computer Graphics and Visualization Techniques