Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning

Chan Young Park; Jillian Fisher; Marius Memmel; Dipika Khullar; Seoho Yun; Abhishek Gupta; Yejin Choi

arXiv:2507.08224·cs.RO·July 22, 2025

Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning

Chan Young Park, Jillian Fisher, Marius Memmel, Dipika Khullar, Seoho Yun, Abhishek Gupta, Yejin Choi

PDF

Open Access 1 Video

TL;DR

SelfReVision is a scalable self-distillation framework that improves small vision-language models for robotic planning by enabling them to critique and revise their own plans, resulting in higher-quality, execution-ready plans without external supervision.

Contribution

The paper introduces SelfReVision, a novel self-critical distillation method that enhances small VLMs for robotic procedural planning without external supervision.

Findings

01

SelfReVision improves the quality of plans generated by small VLMs.

02

Models using SelfReVision outperform much larger models in downstream tasks.

03

SelfReVision enables iterative self-improvement, boosting model performance significantly.

Abstract

Large language models (LLMs) have shown promise in robotic procedural planning, yet their human-centric reasoning often omits the low-level, grounded details needed for robotic execution. Vision-language models (VLMs) offer a path toward more perceptually grounded plans, but current methods either rely on expensive, large-scale models or are constrained to narrow simulation settings. We introduce SelfReVision, a lightweight and scalable self-improvement framework for vision-language procedural planning. SelfReVision enables small VLMs to iteratively critique, revise, and verify their own plans-without external supervision or teacher models-drawing inspiration from chain-of-thought prompting and self-instruct paradigms. Through this self-distillation loop, models generate higher-quality, execution-ready plans that can be used both at inference and for continued fine-tuning. Using models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics