Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting
Nicolai Dorka, Tim Welschehold, Wolfram Burgard

TL;DR
This paper introduces a dynamic method for adjusting the update-to-data ratio in reinforcement learning, improving model performance and robustness without extensive hyperparameter tuning.
Contribution
The paper proposes a novel dynamic UTD ratio adjustment technique based on over- and underfitting detection, applicable to continually evolving datasets in reinforcement learning.
Findings
Improved balance between under- and overestimation in RL models.
Competitive performance with extensive hyperparameter search.
Enhanced robustness and reduced need for hyperparameter tuning.
Abstract
Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques
MethodsEarly Stopping
