Making Deep Q-learning methods robust to time discretization

Corentin Tallec; L\'eonard Blier; Yann Ollivier

arXiv:1901.09732·cs.LG·January 30, 2019·33 cites

Making Deep Q-learning methods robust to time discretization

Corentin Tallec, L\'eonard Blier, Yann Ollivier

PDF

Open Access 1 Repo

TL;DR

This paper addresses the sensitivity of Deep Q-learning methods to time discretization in near continuous-time environments, proposing a new approach that maintains performance across various time steps.

Contribution

It identifies time discretization as a critical sensitivity in DRL and introduces a robust off-policy algorithm that performs well over different time discretizations.

Findings

01

Q-learning collapses with small time steps

02

Q-learning does not exist in continuous time

03

Proposed method achieves robustness across time discretizations

Abstract

Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical factor; this covers, e.g., changing the number of frames per second, or the action frequency of the controller. Empirically, we find that Q-learning-based approaches such as Deep Q- learning (Mnih et al., 2015) and Deep Deterministic Policy Gradient (Lillicrap et al., 2015) collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ctallec/continuous-rl
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Advanced Control Systems Optimization

MethodsQ-Learning