Self-Explainable Affordance Learning with Embodied Caption
Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc Van Gool

TL;DR
This paper introduces a self-explainable affordance learning method with embodied captions, enabling robots to articulate intentions and improve understanding of action possibilities in complex scenes.
Contribution
It presents a novel model combining affordance grounding with self-explanation, along with a new dataset and metrics for visual affordance learning with language explanations.
Findings
Effective in bridging vision and language for affordance understanding
Improves robot explanation capabilities in complex scenes
Demonstrates superior performance through extensive experiments
Abstract
In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks. However, they encounter a main challenge of action ambiguity, illustrated by the vagueness like whether to beat or carry a drum, and the complexities involved in processing intricate scenes. Moreover, it is important for human intervention to rectify robot errors in time. To address these issues, we introduce Self-Explainable Affordance learning (SEA) with embodied caption. This innovation enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning. Due to a lack of appropriate dataset, we unveil a pioneering dataset and metrics tailored for this task, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
