Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks
Stefan Huber, Hannes Unger, Georg Sch\"afer, Jakob Rehrl

TL;DR
This paper analytically solves the Mountain Car problem, introduces Chebyshev policies as efficient RL alternatives, and demonstrates their superior performance and simplicity across multiple control tasks.
Contribution
It provides the first optimal control solution for Mountain Car and introduces Chebyshev policies as a universal, efficient, and explainable alternative to neural networks in RL.
Findings
Optimal control solution for Mountain Car derived after 36 years
Chebyshev policies reduce regret by 4.18 times and need 277 times fewer parameters
Chebyshev policies outperform neural nets on various RL tasks
Abstract
We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 4.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
