Deep Reinforcement Learning via L-BFGS Optimization

Jacob Rafati; Roummel F. Marcia

arXiv:1811.02693·cs.LG·April 18, 2019·5 cites

Deep Reinforcement Learning via L-BFGS Optimization

Jacob Rafati, Roummel F. Marcia

PDF

Open Access

TL;DR

This paper introduces an L-BFGS based optimization method for deep reinforcement learning, offering faster convergence and better generalization than traditional SGD methods, demonstrated on classic ATARI games.

Contribution

The paper proposes a novel L-BFGS quasi-Newton optimization approach for deep RL, bridging the gap between first and second order methods with formal convergence analysis.

Findings

01

Robust convergence observed in experiments.

02

Faster training times compared to SGD.

03

No need for experience replay mechanism.

Abstract

Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selections so as to increase rewarding experiences in their environments. Deep Reinforcement Learning algorithms require solving a nonconvex and nonlinear unconstrained optimization problem. Methods for solving the optimization problems in deep RL are restricted to the class of first-order algorithms, such as stochastic gradient descent (SGD). The major drawback of the SGD methods is that they have the undesirable effect of not escaping saddle points and their performance can be seriously obstructed by ill-conditioning. Furthermore, SGD methods require exhaustive trial and error to fine-tune many learning parameters. Using second derivative information can result in improved convergence properties, but computing the Hessian matrix for large-scale problems is not practical. Quasi-Newton methods require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent