Agile-VLA: Few-Shot Industrial Pose Rectification via Implicit Affordance Anchoring

Teng Yan; Zhengyang Pei; Chengyu Shi; Yue Yu; Yikun Chen; Zilong Zhu; Zelin Fang; Kaile Guo; Zihang Wang; Peigen Tian; and Bingzhuo Zhong

arXiv:2603.22899·cs.RO·March 25, 2026

Agile-VLA: Few-Shot Industrial Pose Rectification via Implicit Affordance Anchoring

Teng Yan, Zhengyang Pei, Chengyu Shi, Yue Yu, Yikun Chen, Zilong Zhu, Zelin Fang, Kaile Guo, Zihang Wang, Peigen Tian, and Bingzhuo Zhong

PDF

Open Access

TL;DR

Agile-VLA introduces a hierarchical framework with implicit affordance anchoring for efficient few-shot industrial pose rectification on edge devices, effectively decoupling perception and control to enable real-time manipulation.

Contribution

The paper proposes a novel Implicit Affordance Anchoring mechanism and an asynchronous dual-stream architecture for fast, few-shot pose reorientation on resource-limited edge platforms.

Findings

01

Achieves robust pose rectification with only 5-shot demonstrations.

02

Decouples perception and control to match their operational frequencies.

03

Demonstrates effectiveness on a 6-DoF manipulator with irregular workpieces.

Abstract

Deploying Vision-Language-Action (VLA) models on resource-constrained edge platforms encounters a fundamental conflict between high-latency semantic inference and the high-frequency control required for dynamic manipulation. To address the challenge, this paper presents Agile-VLA, a hierarchical framework designed for industrial pose reorientation tasks on edge devices such as the NVIDIA Jetson Orin Nano. The core innovation is an Implicit Affordance Anchoring mechanism that directly maps geometric visual cues, specifically centroid and rim keypoint anchors, into structured parametric action primitives, thereby substantially reducing reliance on high-latency semantic inference during closed-loop control. By decoupling perception (10 Hz) from control (50 Hz) via an asynchronous dual-stream architecture, the system effectively mitigates the frequency mismatch inherent in edge-based robot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Social Robot Interaction and HRI