Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
Falcon Z. Dai, Matthew R. Walter

TL;DR
This paper introduces the maximum expected hitting cost (MEHC), a new complexity measure for MDPs that refines diameter-based bounds and analyzes how reward shaping affects this measure and the informativeness of rewards.
Contribution
It defines MEHC as a new complexity measure, shows its role in tightening regret bounds, and analyzes the impact of reward shaping on MEHC and reward informativeness.
Findings
MEHC refines diameter-based bounds on MDPs.
Reward shaping can alter MEHC by up to a factor of two.
MEHC influences the regret bounds of UCRL2-like algorithms.
Abstract
We propose a new complexity measure for Markov decision processes (MDPs), the maximum expected hitting cost (MEHC). This measure tightens the closely related notion of diameter [JOA10] by accounting for the reward structure. We show that this parameter replaces diameter in the upper bound on the optimal value span of an extended MDP, thus refining the associated upper bounds on the regret of several UCRL2-like algorithms. Furthermore, we show that potential-based reward shaping [NHR99] can induce equivalent reward functions with varying informativeness, as measured by MEHC. We further establish that shaping can reduce or increase MEHC by at most a factor of two in a large class of MDPs with finite MEHC and unsaturated optimal average rewards.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Formal Methods in Verification
