$\pi$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Siting Wang; Xiaofeng Wang; Zheng Zhu; Minnan Pei; Xinyu Cui; Cheng Deng; Jian Zhao; Guan Huang; Haifeng Zhang; Jun Wang

arXiv:2603.02083·cs.RO·March 10, 2026

$\pi$-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Siting Wang, Xiaofeng Wang, Zheng Zhu, Minnan Pei, Xinyu Cui, Cheng Deng, Jian Zhao, Guan Huang, Haifeng Zhang, Jun Wang

PDF

Open Access

TL;DR

This paper introduces $ ext{π}$-StepNFT, a new online RL framework for flow-based VLAs that improves exploration and generalization by using step-wise guidance without likelihoods or auxiliary networks.

Contribution

It proposes a critic- and likelihood-free method that enables finer exploration in online RL for flow-based VLAs, enhancing robustness and out-of-distribution generalization.

Findings

01

Achieves competitive few-shot robustness on LIBERO.

02

Outperforms value-based baselines on ManiSkill in OOD scenarios.

03

Eliminates the need for auxiliary value networks.

Abstract

Flow-based vision-language-action (VLA) models excel in embodied control but suffer from intractable likelihoods during multi-step sampling, hindering online reinforcement learning. We propose \textbf{\textit{ $π$ -StepNFT}} (Step-wise Negative-aware Fine-Tuning), a critic-and-likelihood-free framework that requires only a single forward pass per optimization step and eliminates auxiliary value networks. We identify that wider exploration spaces necessitate finer-grained, step-wise guidance for alignment. Empirically, $π$ -StepNFT unlocks latent potential on LIBERO with competitive few-shot robustness. Moreover, it achieves superior generalization on ManiSkill, outperforming value-based baselines in OOD scenarios by preventing overfitting to multimodal features. This property offers a scalable solution promising for complex real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning