GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels
Dylan Turpin, Liquan Wang, Stavros Tsogkas, Sven Dickinson, Animesh, Garg

TL;DR
GIFT is a self-supervised framework that learns visual tool affordances from physical interactions without human labels, enabling generalization to new tools and multiple tasks.
Contribution
It introduces a label-free, interaction-based method for learning visual affordances, improving generalization and performance over existing label-dependent techniques.
Findings
GIFT outperforms baselines on all tested tasks.
It matches human performance on two of three tasks with novel tools.
The method effectively predicts grasp and interaction points for diverse tasks.
Abstract
Tool use requires reasoning about the fit between an object's affordances and the demands of a task. Visual affordance learning can benefit from goal-directed interaction experience, but current techniques rely on human labels or expert demonstrations to generate this data. In this paper, we describe a method that grounds affordances in physical interactions instead, thus removing the need for human labels or expert policies. We use an efficient sampling-based method to generate successful trajectories that provide contact data, which are then used to reveal affordance representations. Our framework, GIFT, operates in two phases: first, we discover visual affordances from goal-directed interaction with a set of procedurally generated tools; second, we train a model to predict new instances of the discovered affordances on novel tools in a self-supervised fashion. In our experiments, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
