Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons
Chen Tessler, Shie Mannor

TL;DR
This paper introduces reward tweaking, a method to learn surrogate rewards that optimize long-term rewards in reinforcement learning without altering the original MDP, especially effective for high-dimensional continuous control tasks.
Contribution
The paper proposes reward tweaking, a novel approach to optimize long-term rewards by learning surrogate rewards that preserve the original task's objectives.
Findings
Reward tweaking improves long-horizon returns in continuous control tasks.
It guides agents to better performance without changing the original MDP.
Theoretically, surrogate rewards can induce optimal behavior in the original task.
Abstract
In reinforcement learning, the discount factor controls the agent's effective planning horizon. Traditionally, this parameter was considered part of the MDP; however, as deep reinforcement learning algorithms tend to become unstable when the effective planning horizon is long, recent works refer to as a hyper-parameter -- thus changing the underlying MDP and potentially leading the agent towards sub-optimal behavior on the original task. In this work, we introduce \emph{reward tweaking}. Reward tweaking learns a surrogate reward function for the discounted setting that induces optimal behavior on the original finite-horizon total reward task. Theoretically, we show that there exists a surrogate reward that leads to optimality in the original task and discuss the robustness of our approach. Additionally, we perform experiments in high-dimensional continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · AI-based Problem Solving and Planning
