Deep Reinforcement Learning via L-BFGS Optimization
Jacob Rafati, Roummel F. Marcia

TL;DR
This paper introduces an L-BFGS based optimization method for deep reinforcement learning, offering faster convergence and better generalization than traditional SGD methods, demonstrated on classic ATARI games.
Contribution
The paper proposes a novel L-BFGS quasi-Newton optimization approach for deep RL, bridging the gap between first and second order methods with formal convergence analysis.
Findings
Robust convergence observed in experiments.
Faster training times compared to SGD.
No need for experience replay mechanism.
Abstract
Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and nonlinear unconstrained optimization problem. Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle points and their performance can be seriously obstructed by ill-conditioning. Furthermore, SGD methods require exhaustive trial and error to fine-tune many learning parameters. Using second derivative information can result in improved convergence properties, but computing the Hessian matrix for large-scale problems is not practical. Quasi-Newton methods require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
