ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement

Chenyu Su; Weiwei Shang; Chen Qian; Fei Zhang; Shuang Cong

arXiv:2507.18262·cs.RO·December 30, 2025

ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement

Chenyu Su, Weiwei Shang, Chen Qian, Fei Zhang, Shuang Cong

PDF

Open Access

TL;DR

ReSemAct introduces a unified framework leveraging semantic structuring and affordance refinement, enabling robots to perform fine-grained manipulation tasks more robustly in dynamic, real-world environments by integrating multimodal large language and vision models.

Contribution

The paper presents ReSemAct, a novel framework that combines semantic structuring and affordance refinement using foundation models for improved robotic manipulation.

Findings

01

ReSemAct achieves robust zero-shot manipulation in complex environments.

02

Semantic structuring improves the accuracy of affordance detection.

03

Refinement strategies enhance manipulation precision and adaptability.

Abstract

Fine-grained robotic manipulation requires grounding natural language into appropriate affordance targets. However, most existing methods driven by foundation models often compress rich semantics into oversimplified affordances, preventing exploitation of implicit semantic information. To address these challenges, we present ReSemAct, a novel unified manipulation framework that introduces Semantic Structuring and Affordance Refinement (SSAR), powered by the automated synergistic reasoning between Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs). Specifically, the Semantic Structuring module derives a unified semantic affordance description from natural language and RGB observations, organizing affordance regions, implicit functional intent, and coarse affordance anchors into a structured representation for downstream refinement. Building upon this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications