Affordance Perception by a Knowledge-Guided Vision-Language Model with   Efficient Error Correction

Gertjan Burghouts; Marianne Schaaphok; Michael van Bekkum; Wouter; Meijer; Fieke Hillerstr\"om; Jelle van Mil

arXiv:2407.13368·cs.CV·July 19, 2024

Affordance Perception by a Knowledge-Guided Vision-Language Model with Efficient Error Correction

Gertjan Burghouts, Marianne Schaaphok, Michael van Bekkum, Wouter, Meijer, Fieke Hillerstr\"om, Jelle van Mil

PDF

Open Access

TL;DR

This paper enhances robot affordance perception in open-world environments by integrating a detailed affordance knowledge base with vision-language models and human-in-the-loop corrections, enabling effective object interaction and task execution.

Contribution

It introduces a precise affordance representation, connects it to vision-language models for unseen objects, and incorporates human feedback for improved perception.

Findings

01

Effective for robot object search and manipulation

02

Improves affordance understanding in open-world scenarios

03

Demonstrated in door opening tasks

Abstract

Mobile robot platforms will increasingly be tasked with activities that involve grasping and manipulating objects in open world environments. Affordance understanding provides a robot with means to realise its goals and execute its tasks, e.g. to achieve autonomous navigation in unknown buildings where it has to find doors and ways to open these. In order to get actionable suggestions, robots need to be able to distinguish subtle differences between objects, as they may result in different action sequences: doorknobs require grasp and twist, while handlebars require grasp and push. In this paper, we improve affordance perception for a robot in an open-world setting. Our contribution is threefold: (1) We provide an affordance representation with precise, actionable affordances; (2) We connect this knowledge base to a foundational vision-language models (VLM) and prompt the VLM for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Robot Manipulation and Learning

MethodsBalanced Selection