Making Deep Q-learning methods robust to time discretization
Corentin Tallec, L\'eonard Blier, Yann Ollivier

TL;DR
This paper addresses the sensitivity of Deep Q-learning methods to time discretization in near continuous-time environments, proposing a new approach that maintains performance across various time steps.
Contribution
It identifies time discretization as a critical sensitivity in DRL and introduces a robust off-policy algorithm that performs well over different time discretizations.
Findings
Q-learning collapses with small time steps
Q-learning does not exist in continuous time
Proposed method achieves robustness across time discretizations
Abstract
Despite remarkable successes, Deep Reinforcement Learning (DRL) is not robust to hyperparameterization, implementation details, or small environment changes (Henderson et al. 2017, Zhang et al. 2018). Overcoming such sensitivity is key to making DRL applicable to real world problems. In this paper, we identify sensitivity to time discretization in near continuous-time environments as a critical factor; this covers, e.g., changing the number of frames per second, or the action frequency of the controller. Empirically, we find that Q-learning-based approaches such as Deep Q- learning (Mnih et al., 2015) and Deep Deterministic Policy Gradient (Lillicrap et al., 2015) collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Advanced Control Systems Optimization
MethodsQ-Learning
