Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale
Junha Lee, Eunha Park, Chunghyun Park, Dahyun Kang, Minsu Cho

TL;DR
This paper introduces Affogato, a large-scale dataset and models for open-vocabulary affordance grounding, enabling better understanding of object interactions through natural language descriptions and heatmap localization.
Contribution
The paper presents a new large-scale benchmark dataset with 150K instances for affordance grounding and develops vision-language models that leverage this dataset for improved open-vocabulary and cross-domain performance.
Findings
Models trained on Affogato outperform existing benchmarks.
The dataset enables effective open-vocabulary affordance grounding.
Models demonstrate strong cross-domain generalization.
Abstract
Affordance grounding-localizing object regions based on natural language descriptions of interactions-is a critical challenge for enabling intelligent agents to understand and interact with their environments. However, this task remains challenging due to the need for fine-grained part-level localization, the ambiguity arising from multiple valid interaction regions, and the scarcity of large-scale datasets. In this work, we introduce Affogato, a large-scale benchmark comprising 150K instances, annotated with open-vocabulary text descriptions and corresponding 3D affordance heatmaps across a diverse set of objects and interactions. Building on this benchmark, we develop simple yet effective vision-language models that leverage pretrained part-aware vision backbones and a text-conditional heatmap decoder. Our models trained with the Affogato dataset achieve promising performance on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning
