Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal   Dynamic Regret, Adaptive Detection, and Separation Design

Yuhao Ding; Ming Jin; Javad Lavaei

arXiv:2211.10815·cs.LG·November 22, 2022

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design

Yuhao Ding, Ming Jin, Javad Lavaei

PDF

Open Access 1 Video

TL;DR

This paper develops algorithms for risk-sensitive reinforcement learning in non-stationary environments, providing near-optimal regret bounds and adaptive detection methods without prior knowledge of environment variation.

Contribution

It introduces restart-based algorithms with regret guarantees and a meta-algorithm for adaptive non-stationarity detection in risk-sensitive RL.

Findings

01

Proposed algorithms achieve near-optimal dynamic regret bounds.

02

Established a lower bound confirming the near-optimality of the algorithms.

03

Designed an adaptive algorithm that detects non-stationarity without prior knowledge.

Abstract

We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Neural and Behavioral Psychology Studies