MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting
Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

TL;DR
MOKA leverages vision-language models with visual prompting to enable open-world robotic manipulation using free-form language commands, bridging perception and action through a compact affordance representation.
Contribution
The paper introduces MOKA, a novel approach that employs VLMs with visual prompts and a point-based affordance representation for zero-shot and few-shot robotic manipulation tasks.
Findings
Effective in zero-shot and few-shot settings
Performs well on tool use, deformable object manipulation, and rearrangement
Enhances robot understanding through visual prompting and in-context learning
Abstract
Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we introduce Marking Open-world Keypoint Affordances (MOKA), an approach that employs VLMs to solve robotic manipulation tasks specified by free-form language instructions. Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world. By prompting the pre-trained VLM, our approach utilizes the VLM's commonsense knowledge and concept understanding acquired from broad data sources to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Robotics and Automated Systems
