Towards Characterizing Divergence in Deep Q-Learning
Joshua Achiam, Ethan Knight, Pieter Abbeel

TL;DR
This paper analyzes divergence in Deep Q-Learning caused by the deadly triad, providing a linear approximation insight, and introduces a stable algorithm that performs well on continuous control benchmarks without common stabilizing tricks.
Contribution
It offers a simple linear approximation analysis of divergence in DQL and proposes a novel stable algorithm that outperforms or matches state-of-the-art results without traditional stabilization techniques.
Findings
Linear approximation provides insight into divergence conditions.
Proposed algorithm achieves stable deep Q-learning without tricks.
Performance matches or exceeds state-of-the-art on MuJoCo benchmarks.
Abstract
Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation. Prior work has demonstrated that together these can lead to divergence in Q-learning algorithms, but the conditions under which divergence occurs are not well-understood. In this note, we give a simple analysis based on a linear approximation to the Q-value updates, which we believe provides insight into divergence under the deadly triad. The central point in our analysis is to consider when the leading order approximation to the deep-Q update is or is not a contraction in the sup norm. Based on this analysis, we develop an algorithm which permits stable deep Q-learning for continuous control without any of the tricks conventionally used (such as target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Face and Expression Recognition
MethodsQ-Learning
