Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges
Hari Srikanth

TL;DR
This paper demonstrates that Natural Policy Gradient methods with linear value function approximation can outperform neural network-based approaches like TRPO and PPO in low-dimensional environments, offering faster training with comparable or better results.
Contribution
The paper introduces a linear function approximation approach for Natural Policy Gradient algorithms, showing it can be more efficient than neural network methods in certain reinforcement learning tasks.
Findings
Linear NPG outperforms TRPO and PPO in Cart Pole and Acrobot benchmarks.
Linear approximation achieves faster training times.
Comparable or superior performance to neural network methods.
Abstract
Neural Network based approximations of the Value function make up the core of leading Policy Based methods such as Trust Regional Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). While this adds significant value when dealing with very complex environments, we note that in sufficiently low State and action space environments, a computationally expensive Neural Network architecture offers marginal improvement over simpler Value approximation methods. We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods. This paper proposes that Natural Policy Gradient (NPG) methods with Linear Function Approximation as a paradigm for value approximation may surpass the performance and speed of Neural Network based models such as TRPO and PPO within these environments. Over Reinforcement Learning benchmarks Cart Pole…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Trust Region Policy Optimization · Entropy Regularization · Proximal Policy Optimization
