On Reward Structures of Markov Decision Processes

Falcon Z. Dai

arXiv:2308.14919·cs.LG·September 4, 2023

On Reward Structures of Markov Decision Processes

Falcon Z. Dai

PDF

Open Access

TL;DR

This paper explores the structure of Markov decision processes focusing on reward functions, introduces new estimators and theoretical insights for reinforcement learning, and proposes methods for safe and multi-objective policy optimization.

Contribution

It presents a novel estimator with instance-specific error bounds, refines key MDP constants for reward-based analysis, and develops algorithms for safe and Pareto-optimal policy planning.

Findings

01

New estimator with $ ilde{O}(rac{ au_s}{n})$ error bound

02

Theoretical link between reward shaping and learning speed

03

Modified algorithms for safe and multi-objective reinforcement learning

Abstract

A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In our inquiry of various kinds of "costs" associated with reinforcement learning inspired by the demands in robotic applications, rewards are central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning. Specifically, we study the sample complexity of policy evaluation and develop a novel estimator with an instance-specific error bound of $\tilde{O} (\frac{τ _{s}}{n})$ for estimating a single state value. Under the online regret minimization setting, we refine the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research