Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

Nicolai Dorka; Tim Welschehold; Wolfram Burgard

arXiv:2303.10144·cs.LG·March 20, 2023·1 cites

Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

Nicolai Dorka, Tim Welschehold, Wolfram Burgard

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a dynamic method for adjusting the update-to-data ratio in reinforcement learning, improving model performance and robustness without extensive hyperparameter tuning.

Contribution

The paper proposes a novel dynamic UTD ratio adjustment technique based on over- and underfitting detection, applicable to continually evolving datasets in reinforcement learning.

Findings

01

Improved balance between under- and overestimation in RL models.

02

Competitive performance with extensive hyperparameter search.

03

Enhanced robustness and reduced need for hyperparameter tuning.

Abstract

Early stopping based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stopping is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari $100$ k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nicolinho/dutd
tfOfficial

Videos

Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Data Stream Mining Techniques

MethodsEarly Stopping