On the Expressivity of Neural Networks for Deep Reinforcement Learning
Kefan Dong, Yuping Luo, Tengyu Ma

TL;DR
This paper investigates the expressive limitations of neural networks in deep reinforcement learning, showing that many MDPs have optimal policies more complex than their dynamics, favoring model-based planning and proposing a bootstrapping method to enhance policy performance.
Contribution
The paper provides theoretical and empirical evidence on the expressive power gap in neural networks for RL and introduces BOOTS, a simple multi-step model-based planner to improve policies.
Findings
Model-based planning better approximates optimal policies in complex MDPs.
Applying BOOTS improves performance on MuJoCo tasks.
Optimal policies can be more complex than dynamics even in simple state spaces.
Abstract
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, -functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal -functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak -function into a stronger policy. Empirical results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
MethodsTest
