Computational Benefits of Intermediate Rewards for Goal-Reaching Policy   Learning

Yuexiang Zhai; Christina Baek; Zhengyuan Zhou; Jiantao Jiao; Yi Ma

arXiv:2107.03961·cs.AI·March 15, 2022

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma

PDF

Open Access 1 Repo

TL;DR

This paper provides a theoretical analysis of how intermediate rewards for subgoals improve the computational efficiency of goal-reaching reinforcement learning, highlighting trade-offs between efficiency and path optimality.

Contribution

It introduces a formal framework for understanding the computational benefits of intermediate rewards in goal-reaching tasks, including new settings and analysis.

Findings

01

Adding intermediate rewards reduces the number of value iterations needed.

02

Intermediate rewards can lead to non-shortest paths despite faster convergence.

03

Experimental results support the theoretical advantages of intermediate rewards.

Abstract

Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way {\em intermediate states}, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding {\em intermediate rewards} to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kebaek/minigrid
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management

MethodsQ-Learning