Reinforcement Learning with Random Time Horizons

Enric Ribera Borrell; Lorenz Richter; Christof Sch\"utte

arXiv:2506.00962·cs.LG·August 15, 2025

Reinforcement Learning with Random Time Horizons

Enric Ribera Borrell, Lorenz Richter, Christof Sch\"utte

PDF

Open Access 1 Video

TL;DR

This paper extends reinforcement learning to include random stopping times, deriving new policy gradient formulas and demonstrating improved optimization convergence in practical experiments.

Contribution

It introduces a rigorous framework for RL with random time horizons, deriving policy gradient formulas that account for trajectory-dependent stopping times.

Findings

01

New policy gradient formulas for random horizons

02

Improved convergence in numerical experiments

03

Connections to optimal control theory

Abstract

We extend the standard reinforcement learning framework to random time horizons. While the classical setting typically assumes finite and deterministic or infinite runtimes of trajectories, we argue that multiple real-world applications naturally exhibit random (potentially trajectory-dependent) stopping times. Since those stopping times typically depend on the policy, their randomness has an effect on policy gradient formulas, which we (mostly for the first time) derive rigorously in this work both for stochastic and deterministic policies. We present two complementary perspectives, trajectory or state-space based, and establish connections to optimal control theory. Our numerical experiments demonstrate that using the proposed formulas can significantly improve optimization convergence compared to traditional approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning with Random Time Horizons· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications