Details Make a Difference: Object State-Sensitive Neurorobotic Task   Planning

Xiaowen Sun; Xufeng Zhao; Jae Hee Lee; Wenhao Lu; Matthias Kerzel,; Stefan Wermter

arXiv:2406.09988·cs.AI·October 17, 2024

Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

Xiaowen Sun, Xufeng Zhao, Jae Hee Lee, Wenhao Lu, Matthias Kerzel,, Stefan Wermter

PDF

Open Access 1 Repo

TL;DR

This paper introduces an object state-sensitive planning agent for robots, leveraging pre-trained neural networks, and compares modular and monolithic models in tabletop scenarios, demonstrating the effectiveness of VLMs in state-aware task planning.

Contribution

The paper presents OSSA, a novel neural network-based agent for object state-sensitive planning, and provides a new benchmark dataset for evaluating such tasks.

Findings

01

Monolithic VLM-based model outperforms modular model.

02

Both models can handle object state-sensitive tasks.

03

The approach advances robot task planning with pre-trained neural networks.

Abstract

The state of an object reflects its current status or condition and is important for a robot's task planning and manipulation. However, detecting an object's state and generating a state-sensitive plan for robots is challenging. Recently, pre-trained Large Language Models (LLMs) and Vision-Language Models (VLMs) have shown impressive capabilities in generating plans. However, to the best of our knowledge, there is hardly any investigation on whether LLMs or VLMs can also generate object state-sensitive plans. To study this, we introduce an Object State-Sensitive Agent (OSSA), a task-planning agent empowered by pre-trained neural networks. We propose two methods for OSSA: (i) a modular model consisting of a pre-trained vision processing module (dense captioning model, DCM) and a natural language processing model (LLM), and (ii) a monolithic model consisting only of a VLM. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiao-wen-sun/ossa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces