Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning
Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

TL;DR
This paper provides a theoretical analysis of how intermediate rewards for subgoals improve the computational efficiency of goal-reaching reinforcement learning, highlighting trade-offs between efficiency and path optimality.
Contribution
It introduces a formal framework for understanding the computational benefits of intermediate rewards in goal-reaching tasks, including new settings and analysis.
Findings
Adding intermediate rewards reduces the number of value iterations needed.
Intermediate rewards can lead to non-shortest paths despite faster convergence.
Experimental results support the theoretical advantages of intermediate rewards.
Abstract
Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way {\em intermediate states}, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management
MethodsQ-Learning
