# Reward Advancement: Transforming Policy under Maximum Causal Entropy   Principle

**Authors:** Guojun Wu, Yanhua Li, Zhenming Liu, Jie Bao, Yu Zheng, Jieping Ye, Jun, Luo

arXiv: 1907.05390 · 2019-07-12

## TL;DR

This paper explores how to modify reward functions in Markov Decision Processes to transform an agent's policy into a target policy under the maximum causal entropy principle, providing algorithms to find minimal-cost reward adjustments.

## Contribution

It introduces the reward advancement problem, characterizes the space of reward functions capable of policy transformation, and proposes an algorithm to find minimal-cost reward modifications.

## Key findings

- Infinite reward functions can achieve the same policy transformation.
- An algorithm is proposed to find minimal-cost reward adjustments.
- The method enables controlled policy shaping under bounded rationality.

## Abstract

Many real-world human behaviors can be characterized as a sequential decision making processes, such as urban travelers choices of transport modes and routes (Wu et al. 2017). Differing from choices controlled by machines, which in general follows perfect rationality to adopt the policy with the highest reward, studies have revealed that human agents make sub-optimal decisions under bounded rationality (Tao, Rohde, and Corcoran 2014). Such behaviors can be modeled using maximum causal entropy (MCE) principle (Ziebart 2010). In this paper, we define and investigate a general reward trans-formation problem (namely, reward advancement): Recovering the range of additional reward functions that transform the agent's policy from original policy to a predefined target policy under MCE principle. We show that given an MDP and a target policy, there are infinite many additional reward functions that can achieve the desired policy transformation. Moreover, we propose an algorithm to further extract the additional rewards with minimum "cost" to implement the policy transformation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05390/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05390/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1907.05390/full.md

---
Source: https://tomesphere.com/paper/1907.05390