RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation
Xiaoshuai Hao, Yingbo Tang, Lingfeng Zhang, Yanbiao Ma, Yunfeng Diao, Ziyu Jia, Wenbo Ding, Hangjun Ye, Long Chen

TL;DR
RoboAfford++ is a large, AI-enhanced dataset designed to improve multimodal affordance learning for robotic manipulation and navigation, addressing current limitations in scene understanding and interaction planning.
Contribution
The paper introduces RoboAfford++, a comprehensive dataset with nearly 870,000 images and 2 million QA annotations, and a benchmark for evaluating affordance prediction in robotics.
Findings
Fine-tuning on RoboAfford++ improves VLMs' affordance reasoning.
Existing VLMs struggle with detailed affordance inference.
The dataset enables better understanding of object and spatial affordances.
Abstract
Robotic manipulation and navigation are fundamental capabilities of embodied intelligence, enabling effective robot interactions with the physical world. Achieving these capabilities requires a cohesive understanding of the environment, including object recognition to localize target objects, object affordances to identify potential interaction areas and spatial affordances to discern optimal areas for both object placement and robot movement. While Vision-Language Models (VLMs) excel at high-level task planning and scene understanding, they often struggle to infer actionable positions for physical interaction, such as functional grasping points and permissible placement regions. This limitation stems from the lack of fine-grained annotations for object and spatial affordances in their training datasets. To tackle this challenge, we introduce RoboAfford++, a generative AI-enhanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
