Assessing VLM-Driven Semantic-Affordance Inference for Non-Humanoid Robot Morphologies

Jess Jones; Raul Santos-Rodriguez; Sabine Hauert

arXiv:2604.19509·cs.RO·April 22, 2026

Assessing VLM-Driven Semantic-Affordance Inference for Non-Humanoid Robot Morphologies

Jess Jones, Raul Santos-Rodriguez, Sabine Hauert

PDF

TL;DR

This paper evaluates the ability of vision-language models to infer affordances for robots with non-humanoid shapes, highlighting their strengths and limitations across diverse object categories and robot forms.

Contribution

It introduces a hybrid dataset combining real and synthetic scenarios and provides an empirical analysis of VLM performance on non-humanoid robotic affordance inference.

Findings

01

VLMs show promising generalisation to non-humanoid robots.

02

Performance varies significantly across object categories.

03

VLMs tend to have low false positives but high false negatives, especially in novel tool use scenarios.

Abstract

Vision-language models (VLMs) have demonstrated remarkable capabilities in understanding human-object interactions, but their application to robotic systems with non-humanoid morphologies remains largely unexplored. This work investigates whether VLMs can effectively infer affordances for robots with fundamentally different embodiments than humans, addressing a critical gap in the deployment of these models for diverse robotic applications. We introduce a novel hybrid dataset that combines annotated real-world robotic affordance-object relations with VLM-generated synthetic scenarios, and perform an empirical analysis of VLM performance across multiple object categories and robot morphologies, revealing significant variations in affordance inference capabilities. Our experiments demonstrate that while VLMs show promising generalisation to non-humanoid robot forms, their performance is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.