Distributionally Robust Deep Q-Learning

Chung I Lu; Julian Sester; Aijia Zhang

arXiv:2505.19058·cs.LG·May 27, 2025

Distributionally Robust Deep Q-Learning

Chung I Lu, Julian Sester, Aijia Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a distributionally robust deep Q-learning algorithm that accounts for model uncertainty in continuous state spaces by using Sinkhorn distance regularization, enhancing policy robustness.

Contribution

It develops a novel deep Q-learning method that incorporates distributional robustness via Sinkhorn distance, addressing model uncertainty in continuous state spaces.

Findings

01

Effective in portfolio optimization with S&P 500 data

02

Modifies Deep Q-Network for worst-case transition optimization

03

Demonstrates tractability and robustness of the approach

Abstract

We propose a novel distributionally robust $Q$ -learning algorithm for the non-tabular case accounting for continuous state spaces where the state transition of the underlying Markov decision process is subject to model uncertainty. The uncertainty is taken into account by considering the worst-case transition from a ball around a reference probability measure. To determine the optimal policy under the worst-case state transition, we solve the associated non-linear Bellman equation by dualising and regularising the Bellman operator with the Sinkhorn distance, which is then parameterized with deep neural networks. This approach allows us to modify the Deep Q-Network algorithm to optimise for the worst case state transition. We illustrate the tractability and effectiveness of our approach through several applications, including a portfolio optimisation task based on S\&{P}~500 data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luchungi/sinkhorn_rdqn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition