Regularized Q-learning through Robust Averaging

Peter Schmitt-F\"orster; Tobias Sutter

arXiv:2405.02201·math.OC·May 30, 2024

Regularized Q-learning through Robust Averaging

Peter Schmitt-F\"orster, Tobias Sutter

PDF

Open Access 1 Repo

TL;DR

This paper introduces 2RA Q-learning, a novel algorithm that uses distributionally robust estimation to control bias, ensuring convergence and improved performance over existing Q-learning methods.

Contribution

The paper presents a new distributionally robust estimator for Q-learning that allows explicit bias control and maintains computational efficiency, with proven convergence and superior empirical results.

Findings

01

2RA Q-learning converges to the optimal policy in tabular settings.

02

The estimator effectively controls estimation bias.

03

Numerical experiments show improved performance over existing methods.

Abstract

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

2raq/code
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and ELM

MethodsQ-Learning