KITE: Keypoint-Conditioned Policies for Semantic Manipulation

Priya Sundaresan; Suneel Belkhale; Dorsa Sadigh; Jeannette Bohg

arXiv:2306.16605·cs.RO·October 13, 2023·5 cites

KITE: Keypoint-Conditioned Policies for Semantic Manipulation

Priya Sundaresan, Suneel Belkhale, Dorsa Sadigh, Jeannette Bohg

PDF

Open Access

TL;DR

KITE introduces a two-step framework using keypoints and instructions for precise semantic manipulation in robots, enabling accurate interpretation and execution of language commands across various real-world tasks.

Contribution

The paper presents a novel keypoint-conditioned approach that improves semantic manipulation and generalization in instruction-following robots, outperforming existing methods.

Findings

01

Achieves over 70% success rate in real-world tasks

02

Outperforms pre-trained language models and end-to-end visuomotor control

03

Effective in diverse applications like grasping and coffee-making

Abstract

While natural language offers a convenient shared interface for humans and robots, enabling robots to interpret and follow language commands remains a longstanding challenge in manipulation. A crucial step to realizing a performant instruction-following robot is achieving semantic manipulation, where a robot interprets language at different specificities, from high-level instructions like "Pick up the stuffed animal" to more detailed inputs like "Grab the left ear of the elephant." To tackle this, we propose Keypoints + Instructions to Execution (KITE), a two-step framework for semantic manipulation which attends to both scene semantics (distinguishing between different objects in a visual scene) and object semantics (precisely localizing different parts within an object instance). KITE first grounds an input instruction in a visual scene through 2D image keypoints, providing a highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsOPT