STABLE: Simulation-Ready Tabletop Layout Generation via a Semantics-Physics Dual System
Zhen Luo, Yixuan Yang, Xudong Xu, Jinkun Hao, Zhaoyang Lyu, Feng Zheng, Jiangmiao Pang, Yanwei Fu

TL;DR
STABLE is a dual-system approach combining semantic reasoning and physics-based correction to generate physically plausible, task-specific tabletop scenes from instructions, improving over existing LLM-based methods.
Contribution
The paper introduces STABLE, a novel semantics-physics dual-system that refines scene layouts through an iterative process, ensuring physical plausibility and semantic accuracy.
Findings
STABLE produces scenes that conform to task instructions.
It significantly improves physical validity over prior methods.
Experiments validate the effectiveness of the dual-system approach.
Abstract
Generating simulation-ready tabletop scenes from task instructions is an intriguing and promising research direction in the field of Embodied AI. However, existing task-to-scene generation methods rely exclusively on large language models (LLMs) to predict scene layouts, inevitably yielding object collisions or floating due to LLMs' inherent limitations in 3D spatial reasoning. In this paper, we present STABLE, a semantics-physics dual-system tailored for simulation-ready tabletop scene generation. STABLE consists of two complementary modules: (i) a Semantic Reasoner, a fine-tuned LLM trained on a structured tabletop scene dataset to generate coarse layouts from input task instructions, and (ii) a Physics Corrector, a physics-aware flow-based denoising model that outputs pose updates to refine layouts, which ensures the physical plausibility of scenes while preserves semantic alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
