Transition-based versus State-based Reward Functions for MDPs with Value-at-Risk
Shuai Ma, Jia Yuan Yu

TL;DR
This paper investigates how different reward function representations affect the estimation of Value-at-Risk in Markov decision processes, revealing that simplified reward functions can alter VaR outcomes and proposing methods for accurate estimation.
Contribution
It demonstrates the impact of transition-based versus state-based reward functions on VaR in MDPs and introduces a transformation algorithm for accurate VaR estimation in complex reward settings.
Findings
State-based reward functions can change VaR compared to transition-based ones.
Spectral theory and CLT can estimate VaR in long-horizon MDPs.
A transformation algorithm preserves total reward distribution for VaR estimation.
Abstract
In reinforcement learning, the reward function on current state and action is widely used. When the objective is about the expectation of the (discounted) total reward only, it works perfectly. However, if the objective involves the total reward distribution, the result will be wrong. This paper studies Value-at-Risk (VaR) problems in short- and long-horizon Markov decision processes (MDPs) with two reward functions, which share the same expectations. Firstly we show that with VaR objective, when the real reward function is transition-based (with respect to action and both current and next states), the simplified (state-based, with respect to action and current state only) reward function will change the VaR. Secondly, for long-horizon MDPs, we estimate the VaR function with the aid of spectral theory and the central limit theorem. Thirdly, since the estimation method is for a Markov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fault Detection and Control Systems · Advanced Control Systems Optimization
