Learning to Play in a Day: Faster Deep Reinforcement Learning by   Optimality Tightening

Frank S. He; Yang Liu; Alexander G. Schwing; Jian Peng

arXiv:1611.01606·cs.LG·November 8, 2016·47 cites

Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new reinforcement learning training algorithm that combines deep Q-learning with constrained optimization, significantly reducing training time and improving performance across multiple Atari games.

Contribution

It presents a novel algorithm that tightens optimality constraints to accelerate deep reinforcement learning training and enhance reward propagation.

Findings

01

Reduces training time in Atari games

02

Improves accuracy over baseline methods

03

Demonstrates practical efficiency gains

Abstract

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

suyoung-lee/Episodic-Backward-Update
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research

MethodsQ-Learning