Closing the Train-Test Gap in World Models for Gradient-Based Planning

Arjun Parthasarathy; Nimit Kalra; Rohun Agrawal; Yann LeCun; Oumayma Bounou; Pavel Izmailov; Micah Goldblum

arXiv:2512.09929·cs.LG·December 11, 2025

Closing the Train-Test Gap in World Models for Gradient-Based Planning

Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum

PDF

Open Access 3 Reviews

TL;DR

This paper introduces training techniques for world models that significantly improve gradient-based planning efficiency and performance, closing the gap with traditional methods in robotics tasks.

Contribution

It proposes train-time data synthesis methods that enhance gradient-based planning in world models, enabling faster and more effective inference.

Findings

01

Outperforms classical CEM in object manipulation tasks

02

Matches CEM performance in navigation tasks

03

Operates at 10% of the time budget of existing methods

Abstract

World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The paper has a straightforward motivation and gives a clear definition of gradient based planning. The two proposed methods are well justified. The proposed methods show improvements over doing gradient-based planning without them.

Weaknesses

This paper does not define key terms and algorithms and has ambiguity in tables and figures. As a result, someone from a planning background, that does not specialize in planning for robotics, will struggle to understand key points. Here is a lists: - The interaction between CEM and other methods is not clear from Table 1. - The CEM algorithm for planning is not defined, but it is a key comparison. - The planning domains are not defined. Some images of them may help. The results are only compet

Reviewer 02Rating 6Confidence 3

Strengths

- Clear identification of a practical train test gap for world models used with gradient based planning and a concrete diagnosis that planning induces out of distribution trajectories and adversarial solutions in latent space. - Two simple and implementable finetuning strategies that are compatible with existing latent world models. Online World Modeling leverages a simulator to correct planned rollouts. Adversarial World Modeling promotes robustness with lightweight one step perturbations in la

Weaknesses

- Limited experimental breadth. Only three simulated tasks are considered and all come from the same DINO World Model setup. The work would be more convincing with additional domains or harder long horizon tasks. - Comparisons to alternative strong planning baselines are incomplete. Prior hybrid methods that interleave CEM and gradient steps are not compared. There is no comparison to iLQR or DDP on lower dimensional variants, and MPPI is not included despite being common in this setting. - The

Reviewer 03Rating 2Confidence 4

Strengths

- The “train–test gap” framing is intuitive and addresses a real limitation of current world-model planning. - Both OWM and AWM are straightforward, well-explained extensions of DAgger and adversarial training to latent dynamics. - Demonstrates that smoother, more robust dynamics enable faster differentiable planning.

Weaknesses

- **Limited experimental scope:** evaluated only on a limited number of toy 2-D domains; no high-dimensional or real-robot settings, despite claims about scalability. - **Incremental novelty:** both methods are direct adaptations of known techniques; no new architectural or theoretical contribution. - **Missing baselines:** lacks comparison with modern hybrid planners (e.g. **TD-MPC2**). - **Unclear wall-time reporting:** runtime results only shown for one task. - **Writing clarity:*

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · AI-based Problem Solving and Planning · Reinforcement Learning in Robotics