The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning
Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

TL;DR
This paper introduces LEAST, a method for early episode termination in deep RL to avoid unproductive experiences, thereby improving learning efficiency across multiple algorithms and benchmarks.
Contribution
The paper proposes LEAST, a lightweight mechanism that strategically terminates episodes early based on Q-value and gradient statistics, addressing the sunk cost fallacy in deep RL.
Findings
LEAST improves sample efficiency in deep RL algorithms.
The method enhances performance on MuJoCo and DeepMind Control Suite benchmarks.
Early termination reduces wasteful environment interactions.
Abstract
Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of "polluting" the replay buffer with data which can exacerbate optimization challenges in addition to wasting environment interactions due to wasteful sampling. We argue that sampling these uninformative and wasteful transitions can be avoided by addressing the sunk cost fallacy, which, in the context of deep RL, is the tendency towards continuing an episode until termination. To address this, we propose learn to stop (LEAST), a lightweight mechanism that enables strategic early episode termination based on Q-value and gradient statistics, which helps agents recognize when to terminate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
