ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning

Tuan Van Vo; Tan Quang Nguyen; Khang Minh Nguyen; Duy Ho Minh Nguyen; Minh Nhat Vu

arXiv:2505.19080·cs.RO·May 27, 2025

ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning

Tuan Van Vo, Tan Quang Nguyen, Khang Minh Nguyen, Duy Ho Minh Nguyen, Minh Nhat Vu

PDF

Open Access

TL;DR

ReFineVLA introduces a reasoning-aware fine-tuning framework for vision-language-action models, enhancing their interpretability and performance in robotic manipulation tasks by incorporating expert-generated rationales.

Contribution

It proposes a novel method to augment VLA models with reasoning rationales and fine-tune them, improving reasoning capabilities and task success rates.

Findings

01

Achieves 5.0% higher success rate on manipulation tasks.

02

Enhances attention focus on relevant objects and actions.

03

Outperforms state-of-the-art baselines in various settings.

Abstract

Vision-Language-Action (VLA) models have gained much attention from the research community thanks to their strength in translating multimodal observations with linguistic instructions into robotic actions. Despite their recent advancements, VLAs often overlook the explicit reasoning and only learn the functional input-action mappings, omitting these crucial logical steps for interpretability and generalization for complex, long-horizon manipulation tasks. In this work, we propose \textit{ReFineVLA}, a multimodal reasoning-aware framework that fine-tunes VLAs with teacher-guided reasons. We first augment robotic datasets with reasoning rationales generated by an expert teacher model, guiding VLA models to learn to reason about their actions. Then, we use \textit{ReFineVLA} to fine-tune pre-trained VLAs with the reasoning-enriched datasets, while maintaining their inherent generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Model-Driven Software Engineering Techniques