Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

Dmitrii Plotnikov; Iaroslav Kolomiets; Dmitrii Maliukov; Dmitrij Kosenkov; Daniia Zinniatullina; Artem Trandofilov; Georgii Gazaryan; Kirill Bogatikov; Timofei Kozlov; Igor Duchinskii; Mikhail Konenkov; Miguel Altamirano Cabrera; Dzmitry Tsetserukou

arXiv:2603.22169·cs.RO·March 24, 2026

Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

Dmitrii Plotnikov, Iaroslav Kolomiets, Dmitrii Maliukov, Dmitrij Kosenkov, Daniia Zinniatullina, Artem Trandofilov, Georgii Gazaryan, Kirill Bogatikov, Timofei Kozlov, Igor Duchinskii, Mikhail Konenkov, Miguel Altamirano Cabrera, Dzmitry Tsetserukou

PDF

Open Access

TL;DR

This paper introduces a novel verbal reinforcement learning framework for interpretable, task-level robotic planning that iteratively improves policies through natural language feedback and interaction with real robots.

Contribution

It presents a closed-loop VRL framework that refines symbolic policies using language-based feedback, enabling transparent and adaptive robotic task planning.

Findings

01

Supports explainable policy improvements

02

Enables closed-loop adaptation to failures

03

Achieves reliable real-robot deployment

Abstract

We propose a new Verbal Reinforcement Learning (VRL) framework for interpretable task-level planning in mobile robotic systems operating under execution uncertainty. The framework follows a closed-loop architecture that enables iterative policy improvement through interaction with the physical environment. In our framework, executable Behavior Trees are repeatedly refined by a Large Language Model actor using structured natural-language feedback produced by a Vision-Language Model critic that observes the physical robot and execution traces. Unlike conventional reinforcement learning, policy updates in VRL occur directly at the symbolic planning level, without gradient-based optimization. This enables transparent reasoning, explicit causal feedback, and human-interpretable policy evolution. We validate the proposed framework on a real mobile robot performing a multi-stage manipulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI