PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Shang-Ching Liu; Van Nhiem Tran; Wenkai Chen; Wei-Lun Cheng; Yen-Lin Huang; I-Bin Liao; Yung-Hui Li; Jianwei Zhang

arXiv:2410.11564·cs.RO·July 8, 2025

PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

Shang-Ching Liu, Van Nhiem Tran, Wenkai Chen, Wei-Lun Cheng, Yen-Lin Huang, I-Bin Liao, Yung-Hui Li, Jianwei Zhang

PDF

Open Access

TL;DR

PAVLM is a novel framework that enhances 3D affordance understanding in point clouds by integrating large language models with geometric modules, improving generalization in open-world robotic tasks.

Contribution

The paper introduces PAVLM, combining pre-trained language models with geometric-guided propagation to advance 3D affordance understanding from point clouds.

Findings

01

Outperforms baseline methods on 3D-AffordanceNet benchmark.

02

Excels in generalizing to novel open-world affordance tasks.

03

Enhances semantic understanding of physical properties in 3D objects.

Abstract

Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications · Autonomous Vehicle Technology and Safety