On the Hardness of Reinforcement Learning with Transition Look-Ahead
Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

TL;DR
This paper investigates the computational complexity of reinforcement learning with transition look-ahead, revealing that planning with one-step look-ahead is polynomial-time solvable, while longer look-aheads are NP-hard.
Contribution
It establishes a clear complexity boundary, showing polynomial solvability for one-step look-ahead and NP-hardness for two or more steps, through novel formulations.
Findings
Optimal planning with one-step look-ahead is polynomial-time solvable.
Planning with two or more steps of look-ahead is NP-hard.
The results define the boundary between tractable and intractable RL planning problems.
Abstract
We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead () can be solved in polynomial time through a novel linear programming formulation. In contrast, for , the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
