Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

Angelo Moroncelli; Roberto Zanetti; Marco Maccarini; Loris Roveda

arXiv:2604.13733·cs.LG·April 16, 2026

Jump-Start Reinforcement Learning with Vision-Language-Action Regularization

Angelo Moroncelli, Roberto Zanetti, Marco Maccarini, Loris Roveda

PDF

TL;DR

This paper introduces VLAJS, a method that combines vision-language-action guidance with reinforcement learning to enhance exploration and efficiency in robotic manipulation tasks, achieving significant sample savings and robust real-world performance.

Contribution

VLAJS is a novel approach that integrates sparse VLA guidance with on-policy RL, improving exploration and credit assignment without requiring demonstrations or continuous queries.

Findings

01

VLAJS outperforms PPO and baselines in sample efficiency, reducing interactions by over 50%.

02

The method enables zero-shot sim-to-real transfer in robotic tasks.

03

VLAJS demonstrates robust real-world manipulation under various conditions.

Abstract

Reinforcement learning (RL) enables high-frequency, closed-loop control for robotic manipulation, but scaling to long-horizon tasks with sparse or imperfect rewards remains difficult due to inefficient exploration and poor credit assignment. Vision-Language-Action (VLA) models leverage large-scale multimodal pretraining to provide generalist, task-level reasoning, but current limitations hinder their direct use in fast and precise manipulation. In this paper, we propose Vision-Language-Action Jump-Starting (VLAJS), a method that bridges sparse VLA guidance with on-policy RL to improve exploration and learning efficiency. VLAJS treats VLAs as transient sources of high-level action suggestions that bias early exploration and improve credit assignment, while preserving the high-frequency, state-based control of RL. Our approach augments Proximal Policy Optimization (PPO) with a directional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.