On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems
Raman Goyal, Suman Chakravorty, Ran Wang, Mohamed Naveed Gul, Mohamed

TL;DR
This paper investigates the challenges of applying reinforcement learning to nonlinear stochastic systems, revealing an exponential variance growth that limits solutions to local feedback strategies and proposing a perturbation-based approach for accuracy.
Contribution
It identifies the 'Curse of Variance' in RL for nonlinear systems and introduces a perturbation structure to obtain accurate local solutions.
Findings
Variance in RL solutions grows factorial-exponentially with approximation order.
Global solutions are infeasible due to explosive variance growth.
Perturbation structure enables accurate local control solutions.
Abstract
We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Thermodynamics and Statistical Mechanics
