"Set It Up": Functional Object Arrangement with Compositional Generative Models (Journal Version)

Yiqing Xu; Jiayuan Mao; Linfeng Li; Yilun Du; Tomas Loz\'ano-P\'erez; Leslie Pack Kaelbling; David Hsu

arXiv:2508.02068·cs.RO·August 8, 2025

"Set It Up": Functional Object Arrangement with Compositional Generative Models (Journal Version)

Yiqing Xu, Jiayuan Mao, Linfeng Li, Yilun Du, Tomas Loz\'ano-P\'erez, Leslie Pack Kaelbling, David Hsu

PDF

TL;DR

This paper introduces SetItUp, a neuro-symbolic framework that uses language models and diffusion models to generate functional object arrangements from minimal instructions, improving over prior methods.

Contribution

It presents a novel two-stage approach combining large language models and diffusion models to predict goal object poses from natural language instructions and few examples.

Findings

01

Outperforms existing models in arrangement quality

02

Generates physically feasible and aesthetically pleasing setups

03

Effective across diverse task domains

Abstract

Functional object arrangement (FORM) is the task of arranging objects to fulfill a function, e.g., "set up a dining table for two". One key challenge here is that the instructions for FORM are often under-specified and do not explicitly specify the desired object goal poses. This paper presents SetItUp, a neuro-symbolic framework that learns to specify the goal poses of objects from a few training examples and a structured natural-language task specification. SetItUp uses a grounding graph, which is composed of abstract spatial relations among objects (e.g., left-of), as its intermediate representation. This decomposes the FORM problem into two stages: (i) predicting this graph among objects and (ii) predicting object poses given the grounding graph. For (i), SetItUp leverages large language models (LLMs) to induce Python programs from a task specification and a few training examples.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.