AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

Tingzheng Jia; Kan Guo; Lanping Qian; Yongli Hu; Daxin Tian; Guixian Qu; Chunmian Lin; Baocai Yin; and Jiapu Wang

arXiv:2604.17787·cs.RO·April 21, 2026

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

Tingzheng Jia, Kan Guo, Lanping Qian, Yongli Hu, Daxin Tian, Guixian Qu, Chunmian Lin, Baocai Yin, and Jiapu Wang

PDF

TL;DR

AnchorRefine introduces a hierarchical approach for vision-language-action tasks, separating trajectory planning from local refinement to enhance manipulation precision and success rates.

Contribution

The paper presents a novel hierarchical framework that factorizes action modeling into trajectory anchors and residual refinement, improving manipulation accuracy.

Findings

01

Achieves up to 7.8% success rate improvement in simulation.

02

Yields up to 18% success rate increase in real-world tasks.

03

Enhances both regression-based and diffusion-based VLA models.

Abstract

Precision-critical manipulation requires both global trajectory organization and local execution correction, yet most vision-language-action (VLA) policies generate actions within a single unified space. This monolithic formulation forces macro-level transport and micro-level refinement to be optimized under the same objective, causing large motions to dominate learning while suppressing small but failure-critical corrective signals. In contrast, human manipulation is structured by global movement planning together with continuous local adjustment during execution. Motivated by this principle, we propose AnchorRefine, a hierarchical framework that factorizes VLA action modeling into trajectory anchor and residual refinement. The anchor planner predicts a coarse motion scaffold, while the refinement module corrects execution-level deviations to improve geometric and contact precision. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.