SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning

Xu Pan; Zhenglin Wan; Xingrui Yu; Xianwei Zheng; Youkai Ke; Ming Sun; Rui Wang; Ziwei Wang; Ivor Tsang

arXiv:2602.00743·cs.RO·February 3, 2026

SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning

Xu Pan, Zhenglin Wan, Xingrui Yu, Xianwei Zheng, Youkai Ke, Ming Sun, Rui Wang, Ziwei Wang, Ivor Tsang

PDF

Open Access 1 Models

TL;DR

SA-VLA introduces a spatially-aware reinforcement learning framework that maintains spatial grounding during policy adaptation, leading to improved robustness and transferability in robotic manipulation tasks.

Contribution

It proposes a novel spatially-aware RL adaptation method that preserves spatial inductive bias by aligning representation, reward, and exploration with task geometry.

Findings

01

Enhanced robustness in manipulation benchmarks

02

Improved zero-shot spatial generalization

03

Stable RL fine-tuning across tasks

Abstract

Vision-Language-Action (VLA) models exhibit strong generalization in robotic manipulation, yet reinforcement learning (RL) fine-tuning often degrades robustness under spatial distribution shifts. For flow-matching VLA policies, this degradation is closely associated with the erosion of spatial inductive bias during RL adaptation, as sparse rewards and spatially agnostic exploration increasingly favor short-horizon visual cues. To address this issue, we propose \textbf{SA-VLA}, a spatially-aware RL adaptation framework that preserves spatial grounding during policy optimization by aligning representation learning, reward design, and exploration with task geometry. SA-VLA fuses implicit spatial representations with visual tokens, provides dense rewards that reflect geometric progress, and employs \textbf{SCAN}, a spatially-conditioned annealed exploration strategy tailored to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
SSSSphinx/SA-VLA
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning