TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

ZhiYuan Feng; Yu Deng; Ruichuan An; Zhenhua Liu; Qixiu Li; Keming Wu; Zhiying Du; Weijie Wang; Haoxiao Wang; Shuang Chen; Sicheng Xu; Yaobo Liang; Jiaolong Yang; and Baining Guo

arXiv:2605.18109·cs.AI·May 19, 2026

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

ZhiYuan Feng, Yu Deng, Ruichuan An, Zhenhua Liu, Qixiu Li, Keming Wu, Zhiying Du, Weijie Wang, Haoxiao Wang, Shuang Chen, Sicheng Xu, Yaobo Liang, Jiaolong Yang, and Baining Guo

PDF

TL;DR

TaskGround introduces a framework for household agents to infer and execute structured tasks from complete scene data, improving success rates and efficiency in household reasoning tasks.

Contribution

It presents a training-free, model-agnostic method for grounding scenes and inferring task structure, enhancing the capabilities of compact models in household reasoning.

Findings

01

TaskGround improves task success rates significantly.

02

Makes Qwen3.5-9B competitive with GPT-5 in scene understanding.

03

Reduces input-token cost by up to 18x.

Abstract

In real home deployments, household agents must often operate from a complete household scene and a situated household request, rather than from a clean task specification. Such requests require agents to identify task-relevant entities, recover intended task conditions, and resolve ordering constraints from the surrounding scene context. We formalize this capability as full-scene household reasoning: given a complete household scene and a situated household request, an agent must infer executable task structure before producing a grounded skill-level action sequence. This setting is challenging because complete household scenes contain substantial task-irrelevant information, making direct complete-scene prompting inefficient and error-prone. In practical deployment, this challenge is further amplified by privacy and local compute constraints, which favor compact open-weight models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.