DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding

Hanqing Wang; Zhenhao Zhang; Kaiyang Ji; Mingyu Liu; Wenti Yin; Yuchao Chen; Zhirui Liu; Xiangyu Zeng; Tianxiang Gui; Hangxing Zhang

arXiv:2508.01651·cs.CV·August 5, 2025

DAG: Unleash the Potential of Diffusion Model for Open-Vocabulary 3D Affordance Grounding

Hanqing Wang, Zhenhao Zhang, Kaiyang Ji, Mingyu Liu, Wenti Yin, Yuchao Chen, Zhirui Liu, Xiangyu Zeng, Tianxiang Gui, Hangxing Zhang

PDF

Open Access

TL;DR

This paper introduces DAG, a diffusion-based framework that leverages text-to-image diffusion models to improve 3D affordance grounding, enabling better generalization and dense affordance prediction.

Contribution

It proposes a novel method using frozen diffusion model representations to extract affordance knowledge for 3D grounding, surpassing existing methods.

Findings

01

Outperforms well-established methods in 3D affordance grounding

02

Exhibits strong open-world generalization

03

Enables dense affordance prediction in 3D objects

Abstract

3D object affordance grounding aims to predict the touchable regions on a 3d object, which is crucial for human-object interaction, human-robot interaction, embodied perception, and robot learning. Recent advances tackle this problem via learning from demonstration images. However, these methods fail to capture the general affordance knowledge within the image, leading to poor generalization. To address this issue, we propose to use text-to-image diffusion models to extract the general affordance knowledge because we find that such models can generate semantically valid HOI images, which demonstrate that their internal representation space is highly correlated with real-world affordance concepts. Specifically, we introduce the DAG, a diffusion-based 3d affordance grounding framework, which leverages the frozen internal representations of the text-to-image diffusion model and unlocks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis