The Surprising Difficulty of Search in Model-Based Reinforcement Learning

Wei-Di Chang; Mikael Henaff; Brandon Amos; Gregory Dudek; Scott Fujimoto

arXiv:2601.21306·cs.LG·January 30, 2026

The Surprising Difficulty of Search in Model-Based Reinforcement Learning

Wei-Di Chang, Mikael Henaff, Brandon Amos, Gregory Dudek, Scott Fujimoto

PDF

Open Access

TL;DR

This paper reveals that in model-based reinforcement learning, search can sometimes impair performance despite accurate models, emphasizing the importance of mitigating distribution shift over model accuracy.

Contribution

It challenges the conventional view by demonstrating the limited effectiveness of search as a replacement for learned policies and highlights techniques to mitigate distribution shift for better results.

Findings

01

Search can harm performance even with accurate models

02

Mitigating distribution shift is more crucial than model accuracy

03

Achieved state-of-the-art results on benchmark domains

Abstract

This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a plug-and-play replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating distribution shift matters more than improving model or value function accuracy. Building on this insight, we identify key techniques for enabling effective search, achieving state-of-the-art performance across multiple popular benchmark domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis