Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Gen Li; Nikolaos Tsagkas; Jifei Song; Ruaridh Mon-Williams; Sethu Vijayakumar; Kun Shao; Laura Sevilla-Lara

arXiv:2408.10123·cs.RO·September 16, 2025

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

PDF

Open Access

TL;DR

This paper presents a system that learns precise object affordances from egocentric videos to improve robotic manipulation, addressing data scarcity, generalization, and real-world deployment challenges.

Contribution

It introduces a novel affordance learning framework that uses egocentric videos and geometric info to enhance generalization and deployability in robotic tasks.

Findings

01

Outperforms state-of-the-art by 13.8% in mIoU

02

Achieves 77.1% success rate in robotic grasping

03

Effective on seen, unseen, and cluttered scenes

Abstract

Affordance, defined as the potential actions that an object offers, is crucial for embodied AI agents. For example, such knowledge directs an agent to grasp a knife by the handle for cutting or by the blade for safe handover. While existing approaches have made notable progress, affordance research still faces three key challenges: data scarcity, poor generalization, and real-world deployment. Specifically, there is a lack of large-scale affordance datasets with precise segmentation maps, existing models struggle to generalize across different domains or novel object and affordance classes, and little work demonstrates deployability in real-world scenarios. In this work, we address these issues by proposing a complete affordance learning system that (1) takes in egocentric videos and outputs precise affordance annotations without human labeling, (2) leverages geometric information and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Mechanisms and Dynamics · Reinforcement Learning in Robotics

MethodsLinear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax