PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation

Zhihao Zhu; Yifan Zheng; Siyu Pan; Yaohui Jin; Yao Mu

arXiv:2508.05976·cs.CV·August 11, 2025

PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation

Zhihao Zhu, Yifan Zheng, Siyu Pan, Yaohui Jin, Yao Mu

PDF

Open Access

TL;DR

PASG is a novel closed-loop framework that automatically extracts geometric primitives and semantically anchors them with affordances, improving robotic manipulation by bridging geometric features and task semantics.

Contribution

It introduces a unified approach combining automatic primitive extraction and semantic grounding with a new benchmark and a fine-tuned vision-language model.

Findings

01

Effective primitive detection across categories

02

Dynamic semantic-affordance coupling

03

Comparable performance to manual annotations

Abstract

The fragmentation between high-level task semantics and low-level geometric features remains a persistent challenge in robotic manipulation. While vision-language models (VLMs) have shown promise in generating affordance-aware visual representations, the lack of semantic grounding in canonical spaces and reliance on manual annotations severely limit their ability to capture dynamic semantic-affordance relationships. To address these, we propose Primitive-Aware Semantic Grounding (PASG), a closed-loop framework that introduces: (1) Automatic primitive extraction through geometric feature aggregation, enabling cross-category detection of keypoints and axes; (2) VLM-driven semantic anchoring that dynamically couples geometric primitives with functional affordances and task-relevant description; (3) A spatial-semantic reasoning benchmark and a fine-tuned VLM (Qwen2.5VL-PA). We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Image Processing and 3D Reconstruction · Manufacturing Process and Optimization