Pitfall of Optimism: Distributional Reinforcement Learning by   Randomizing Risk Criterion

Taehyun Cho; Seungyub Han; Heesoo Lee; Kyungjae Lee; Jungwoo Lee

arXiv:2310.16546·cs.LG·December 6, 2023·1 cites

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Taehyun Cho, Seungyub Han, Heesoo Lee, Kyungjae Lee, Jungwoo Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel distributional reinforcement learning method that randomizes risk criteria to prevent biased exploration, ensuring convergence and outperforming existing algorithms in diverse environments.

Contribution

The paper proposes a new distributional RL algorithm that distorts the risk measure to avoid bias, with proven convergence and optimality under weaker conditions.

Findings

01

Outperforms existing distributional algorithms in Atari 55 games

02

Proves convergence and optimality with weaker contraction properties

03

Avoids biased exploration caused by variance-based optimism

Abstract

Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control