RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

Qiyuan Zhuang; He-Yang Xu; Yijun Wang; Xin-Yang Zhao; Yang-Yang Li; Xiu-Shen Wei

arXiv:2603.29419·cs.RO·April 1, 2026

RAAP: Retrieval-Augmented Affordance Prediction with Cross-Image Action Alignment

Qiyuan Zhuang, He-Yang Xu, Yijun Wang, Xin-Yang Zhao, Yang-Yang Li, Xiu-Shen Wei

PDF

1 Repo

TL;DR

RAAP is a novel framework that combines retrieval and alignment techniques to improve affordance prediction for robots, enabling better generalization and zero-shot manipulation.

Contribution

It introduces a unified retrieval-augmented alignment approach that decouples contact localization from action prediction, enhancing robustness and generalization.

Findings

01

RAAP achieves consistent performance on unseen objects and categories.

02

It enables zero-shot robotic manipulation in simulation and real-world settings.

03

Trained on small datasets, RAAP outperforms existing methods in affordance prediction.

Abstract

Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocalize contact points and mispredict post-contact actions when applied to unseen categories, thereby hindering robust generalization. We introduce Retrieval-Augmented Affordance Prediction (RAAP), a framework that unifies affordance retrieval with alignment-based learning. By decoupling static contact localization and dynamic action direction, RAAP transfers contact points via dense correspondence and predicts action directions through a retrieval-augmented alignment model that consolidates multiple references with dual-weighted attention. Trained on compact subsets of DROID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SEU-VIPGroup/RAAP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.