Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design
Yuhao Ding, Ming Jin, Javad Lavaei

TL;DR
This paper develops algorithms for risk-sensitive reinforcement learning in non-stationary environments, providing near-optimal regret bounds and adaptive detection methods without prior knowledge of environment variation.
Contribution
It introduces restart-based algorithms with regret guarantees and a meta-algorithm for adaptive non-stationarity detection in risk-sensitive RL.
Findings
Proposed algorithms achieve near-optimal dynamic regret bounds.
Established a lower bound confirming the near-optimality of the algorithms.
Designed an adaptive algorithm that detects non-stationarity without prior knowledge.
Abstract
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Neural and Behavioral Psychology Studies
