Generalizable task-oriented object grasping through LLM-guided ontology and similarity-based planning

Hao Chen; Takuya Kiyokawa; Weiwei Wan; Kensuke Harada

arXiv:2603.26412·cs.RO·March 30, 2026

Generalizable task-oriented object grasping through LLM-guided ontology and similarity-based planning

Hao Chen, Takuya Kiyokawa, Weiwei Wan, Kensuke Harada

PDF

TL;DR

This paper presents a geometry-centric, LLM-guided approach to task-oriented object grasping that improves generalization across diverse objects and tasks by using an ontology and similarity-based planning.

Contribution

It introduces a novel ontology and geometric analysis framework that enhances the robustness and generalization of task-oriented grasping without relying on semantic visual features.

Findings

01

High accuracy in functional part selection and grasp generation.

02

Effective generalization to novel objects and tasks.

03

Validation through real-world experiments.

Abstract

Task-oriented grasping (TOG) is more challenging than simple object grasping because it requires precise identification of object parts and careful selection of grasping areas to ensure effective and robust manipulation. While recent approaches have trained large-scale vision-language models to integrate part-level object segmentation with task-aware grasp planning, their instability in part recognition and grasp inference limits their ability to generalize across diverse objects and tasks. To address this issue, we introduce a novel, geometry-centric strategy for more generalizable TOG that does not rely on semantic features from visual recognition, effectively overcoming the viewpoint sensitivity of model-based approaches. Our main proposals include: 1) an object-part-task ontology for functional part selection based on intuitive human commands, constructed using a Large Language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.