On the Convergence of Reinforcement Learning in Nonlinear Continuous   State Space Problems

Raman Goyal; Suman Chakravorty; Ran Wang; Mohamed Naveed Gul; Mohamed

arXiv:2011.10829·cs.LG·July 30, 2021

On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems

Raman Goyal, Suman Chakravorty, Ran Wang, Mohamed Naveed Gul, Mohamed

PDF

Open Access

TL;DR

This paper investigates the challenges of applying reinforcement learning to nonlinear stochastic systems, revealing an exponential variance growth that limits solutions to local feedback strategies and proposing a perturbation-based approach for accuracy.

Contribution

It identifies the 'Curse of Variance' in RL for nonlinear systems and introduces a perturbation structure to obtain accurate local solutions.

Findings

01

Variance in RL solutions grows factorial-exponentially with approximation order.

02

Global solutions are infeasible due to explosive variance growth.

03

Perturbation structure enables accurate local control solutions.

Abstract

We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Thermodynamics and Statistical Mechanics