The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Jiashun Liu; Johan Obando-Ceron; Pablo Samuel Castro; Aaron Courville; Ling Pan

arXiv:2506.13672·cs.LG·June 17, 2025

The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

PDF

Open Access

TL;DR

This paper introduces LEAST, a method for early episode termination in deep RL to avoid unproductive experiences, thereby improving learning efficiency across multiple algorithms and benchmarks.

Contribution

The paper proposes LEAST, a lightweight mechanism that strategically terminates episodes early based on Q-value and gradient statistics, addressing the sunk cost fallacy in deep RL.

Findings

01

LEAST improves sample efficiency in deep RL algorithms.

02

The method enhances performance on MuJoCo and DeepMind Control Suite benchmarks.

03

Early termination reduces wasteful environment interactions.

Abstract

Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of "polluting" the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy, which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination based on Q-value and gradient statistics, which helps agents recognize when to terminate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics