Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors

Peiran Xu; Yadong Mu

arXiv:2505.24103·cs.CV·June 2, 2025

Weakly-Supervised Affordance Grounding Guided by Part-Level Semantic Priors

Peiran Xu, Yadong Mu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper presents a novel weakly supervised approach for affordance grounding that leverages foundation models and part-level semantic priors, significantly improving localization accuracy without dense labels.

Contribution

It introduces a new training pipeline using pseudo labels from part segmentation models, along with three key enhancements for better affordance localization.

Findings

01

Achieved state-of-the-art performance on affordance grounding tasks.

02

Demonstrated the effectiveness of part-level semantic priors in weak supervision.

03

Showed significant improvement over existing methods.

Abstract

In this work, we focus on the task of weakly supervised affordance grounding, where a model is trained to identify affordance regions on objects using human-object interaction images and egocentric object images without dense labels. Previous works are mostly built upon class activation maps, which are effective for semantic segmentation but may not be suitable for locating actions and functions. Leveraging recent advanced foundation models, we develop a supervised training pipeline based on pseudo labels. The pseudo labels are generated from an off-the-shelf part segmentation model, guided by a mapping from affordance to part names. Furthermore, we introduce three key enhancements to the baseline model: a label refining stage, a fine-grained feature alignment process, and a lightweight reasoning module. These techniques harness the semantic knowledge of static objects embedded in…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1) The paper is clearly written and easy to follow. 2) The method is well-motivated, and the VFM-assisted pseudo-labeling should effectively address the challenges of the weakly-supervised setting. 3) The overall improvements over existing methods are quite significant.

Weaknesses

My biggest concern lies in the experimental section. In Table 2, the reasoning model appears to negatively impact the baseline, and the other two design components only provide marginal improvements.

Reviewer 02Rating 8Confidence 4

Strengths

- The problem is important and well-motivated, as affordance grounding is crucial for robotic manipulation and human-object interaction understanding - The proposed pseudo-labeling approach effectively leverages existing foundation models (VLpart, SAM) to provide supervision, addressing limitations of previous CAM-based methods - The label refinement process using exocentric images is novel and well-designed, providing a clever way to improve initial pseudo labels - The reasoning module helps ge

Weaknesses

The choice of CLIP as the vision encoder could be better justified given previous work suggesting limitations (vs DINO, OWLViT, SAM). For example, the paper will be stronger with an ablation study of different visual encoders.

Reviewer 03Rating 6Confidence 3

Strengths

- Clear writing and organization. - Well-motivated technical approach with clear problem formulation. - This paper propose a novel approach that uses visual foundation models and part-level semantic priors for WSAG, unleashing the power of these models for affordance learning. - Using human occlusion cues for label refinement, which is an innovative insight. - Comprehensive experimental validation and thoughtful analysis of limitations in existing methods.

Weaknesses

- Could benefit from more analysis of failure cases. - The label refinement stage using human occlusion cues may be problematic when interactions are ambiguous or when multiple affordances exist. - The mapping from affordance to part names is ad-hoc and manually crafted, which limits the scalability to new affordance types and more complex objects.

Code & Models

Repositories

woyut/wsag-plsp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Model Reduction and Neural Networks · Robot Manipulation and Learning

MethodsFocus