Towards Characterizing Divergence in Deep Q-Learning

Joshua Achiam; Ethan Knight; Pieter Abbeel

arXiv:1903.08894·cs.LG·March 22, 2019·61 cites

Towards Characterizing Divergence in Deep Q-Learning

Joshua Achiam, Ethan Knight, Pieter Abbeel

PDF

Open Access

TL;DR

This paper analyzes divergence in Deep Q-Learning caused by the deadly triad, providing a linear approximation insight, and introduces a stable algorithm that performs well on continuous control benchmarks without common stabilizing tricks.

Contribution

It offers a simple linear approximation analysis of divergence in DQL and proposes a novel stable algorithm that outperforms or matches state-of-the-art results without traditional stabilization techniques.

Findings

01

Linear approximation provides insight into divergence conditions.

02

Proposed algorithm achieves stable deep Q-learning without tricks.

03

Performance matches or exceeds state-of-the-art on MuJoCo benchmarks.

Abstract

Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-understood. In this note, we give a simple analysis based on a linear approximation to the Q-value updates, which we believe provides insight into divergence under the deadly triad. The central point in our analysis is to consider when the leading order approximation to the deep-Q update is or is not a contraction in the sup norm. Based on this analysis, we develop an algorithm which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Face and Expression Recognition

MethodsQ-Learning