Reinforced Reasoning for Embodied Planning

Di Wu; Jiaxin Fan; Junzhe Zang; Guanbo Wang; Wei Yin; Wenhao Li; Bo Jin

arXiv:2505.22050·cs.AI·July 15, 2025

Reinforced Reasoning for Embodied Planning

Di Wu, Jiaxin Fan, Junzhe Zang, Guanbo Wang, Wei Yin, Wenhao Li, Bo Jin

PDF

Open Access

TL;DR

This paper introduces a reinforcement fine-tuning framework that enhances embodied planning by integrating reasoning capabilities, leading to significant performance improvements on interactive environment benchmarks.

Contribution

It presents a novel reinforcement fine-tuning approach that incorporates reasoning into embodied planning models, improving multi-step decision-making in dynamic environments.

Findings

01

Outperforms similar or larger models on Embench benchmark

02

Shows strong generalization to unseen environments

03

Demonstrates the effectiveness of reinforcement-driven reasoning in embodied AI

Abstract

Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals. While recent vision-language models (VLMs) excel at static perception tasks, they struggle with the temporal reasoning, spatial understanding, and commonsense grounding needed for planning in interactive environments. In this work, we introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning. We first distill a high-quality dataset from a powerful closed-source model and perform supervised fine-tuning (SFT) to equip the model with structured decision-making priors. We then design a rule-based reward function tailored to multi-step action quality and optimize the policy via Generalized Reinforced Preference Optimization (GRPO). Our approach is evaluated on Embench, a recent benchmark for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms