Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Taisuke Kobayashi

arXiv:2308.12772·cs.RO·February 25, 2026

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Taisuke Kobayashi

PDF

Open Access

TL;DR

This paper proposes a method to intentionally underestimate the value function at episode termination in TD learning, improving policy stability and robustness across different reward designs and termination conditions.

Contribution

It introduces a novel approach to adjust value estimation at termination, addressing issues caused by traditional zero-value assumptions in TD learning.

Findings

01

The method stabilizes policy learning in various tasks.

02

It prevents overestimation caused by termination handling.

03

Experimental results confirm improved policy optimality.

Abstract

Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies · Robot Manipulation and Learning