RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation

Xiaoshuai Hao; Yingbo Tang; Lingfeng Zhang; Yanbiao Ma; Yunfeng Diao; Ziyu Jia; Wenbo Ding; Hangjun Ye; Long Chen

arXiv:2511.12436·cs.RO·November 18, 2025

RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation

Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang, Yanbiao Ma, Yunfeng Diao, Ziyu Jia, Wenbo Ding, Hangjun Ye, Long Chen

PDF

Open Access

TL;DR

RoboAfford++ is a large, AI-enhanced dataset designed to improve multimodal affordance learning for robotic manipulation and navigation, addressing current limitations in scene understanding and interaction planning.

Contribution

The paper introduces RoboAfford++, a comprehensive dataset with nearly 870,000 images and 2 million QA annotations, and a benchmark for evaluating affordance prediction in robotics.

Findings

01

Fine-tuning on RoboAfford++ improves VLMs' affordance reasoning.

02

Existing VLMs struggle with detailed affordance inference.

03

The dataset enables better understanding of object and spatial affordances.

Abstract

Robotic manipulation and navigation are fundamental capabilities of embodied intelligence, enabling effective robot interactions with the physical world. Achieving these capabilities requires a cohesive understanding of the environment, including object recognition to localize target objects, object affordances to identify potential interaction areas and spatial affordances to discern optimal areas for both object placement and robot movement. While Vision-Language Models (VLMs) excel at high-level task planning and scene understanding, they often struggle to infer actionable positions for physical interaction, such as functional grasping points and permissible placement regions. This limitation stems from the lack of fine-grained annotations for object and spatial affordances in their training datasets. To tackle this challenge, we introduce RoboAfford++, a generative AI-enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI