Reward Tweaking: Maximizing the Total Reward While Planning for Short   Horizons

Chen Tessler; Shie Mannor

arXiv:2002.03327·cs.LG·June 24, 2020·5 cites

Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons

Chen Tessler, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces reward tweaking, a method to learn surrogate rewards that optimize long-term rewards in reinforcement learning without altering the original MDP, especially effective for high-dimensional continuous control tasks.

Contribution

The paper proposes reward tweaking, a novel approach to optimize long-term rewards by learning surrogate rewards that preserve the original task's objectives.

Findings

01

Reward tweaking improves long-horizon returns in continuous control tasks.

02

It guides agents to better performance without changing the original MDP.

03

Theoretically, surrogate rewards can induce optimal behavior in the original task.

Abstract

In reinforcement learning, the discount factor $γ$ controls the agent's effective planning horizon. Traditionally, this parameter was considered part of the MDP; however, as deep reinforcement learning algorithms tend to become unstable when the effective planning horizon is long, recent works refer to $γ$ as a hyper-parameter -- thus changing the underlying MDP and potentially leading the agent towards sub-optimal behavior on the original task. In this work, we introduce \emph{reward tweaking}. Reward tweaking learns a surrogate reward function $\tilde{r}$ for the discounted setting that induces optimal behavior on the original finite-horizon total reward task. Theoretically, we show that there exists a surrogate reward that leads to optimality in the original task and discuss the robustness of our approach. Additionally, we perform experiments in high-dimensional continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · AI-based Problem Solving and Planning