On the Hardness of Reinforcement Learning with Transition Look-Ahead

Corentin Pla; Hugo Richard; Marc Abeille; Nadav Merlis; Vianney Perchet

arXiv:2510.19372·stat.ML·March 31, 2026

On the Hardness of Reinforcement Learning with Transition Look-Ahead

Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

PDF

TL;DR

This paper investigates the computational complexity of reinforcement learning with transition look-ahead, revealing that planning with one-step look-ahead is polynomial-time solvable, while longer look-aheads are NP-hard.

Contribution

It establishes a clear complexity boundary, showing polynomial solvability for one-step look-ahead and NP-hardness for two or more steps, through novel formulations.

Findings

01

Optimal planning with one-step look-ahead is polynomial-time solvable.

02

Planning with two or more steps of look-ahead is NP-hard.

03

The results define the boundary between tractable and intractable RL planning problems.

Abstract

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $ℓ$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ( $ℓ = 1$ ) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $ℓ \geq 2$ , the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.