Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening
Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng

TL;DR
This paper introduces a new reinforcement learning training algorithm that combines deep Q-learning with constrained optimization, significantly reducing training time and improving performance across multiple Atari games.
Contribution
It presents a novel algorithm that tightens optimality constraints to accelerate deep reinforcement learning training and enhance reward propagation.
Findings
Reduces training time in Atari games
Improves accuracy over baseline methods
Demonstrates practical efficiency gains
Abstract
We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research
MethodsQ-Learning
