TL;DR
RAAP is a novel framework that combines retrieval and alignment techniques to improve affordance prediction for robots, enabling better generalization and zero-shot manipulation.
Contribution
It introduces a unified retrieval-augmented alignment approach that decouples contact localization from action prediction, enhancing robustness and generalization.
Findings
RAAP achieves consistent performance on unseen objects and categories.
It enables zero-shot robotic manipulation in simulation and real-world settings.
Trained on small datasets, RAAP outperforms existing methods in affordance prediction.
Abstract
Understanding object affordances is essential for enabling robots to perform purposeful and fine-grained interactions in diverse and unstructured environments. However, existing approaches either rely on retrieval, which is fragile due to sparsity and coverage gaps, or on large-scale models, which frequently mislocalize contact points and mispredict post-contact actions when applied to unseen categories, thereby hindering robust generalization. We introduce Retrieval-Augmented Affordance Prediction (RAAP), a framework that unifies affordance retrieval with alignment-based learning. By decoupling static contact localization and dynamic action direction, RAAP transfers contact points via dense correspondence and predicts action directions through a retrieval-augmented alignment model that consolidates multiple references with dual-weighted attention. Trained on compact subsets of DROID…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
