Stochastic Decision Horizons for Constrained Reinforcement Learning

Nikola Milosevic; Leonard Franz; Daniel Haeufle; Georg Martius; Nico Scherf; Pavel Kolev

arXiv:2602.04599·cs.LG·February 5, 2026

Stochastic Decision Horizons for Constrained Reinforcement Learning

Nikola Milosevic, Leonard Franz, Daniel Haeufle, Georg Martius, Nico Scherf, Pavel Kolev

PDF

Open Access

TL;DR

This paper introduces a novel approach to constrained reinforcement learning using stochastic decision horizons, improving off-policy scalability and sample efficiency by integrating survival-weighted objectives and new violation semantics.

Contribution

It proposes a Control as Inference framework with stochastic decision horizons and two violation semantics, enabling scalable and efficient constrained RL with improved performance.

Findings

01

Enhanced sample efficiency on standard benchmarks.

02

Effective scaling to high-dimensional musculoskeletal tasks.

03

Distinct optimization structures for different violation semantics.

Abstract

Constrained Markov decision processes (CMDPs) provide a principled model for handling constraints, such as safety and other auxiliary objectives, in reinforcement learning. The common approach of using additive-cost constraints and dual variables often hinders off-policy scalability. We propose a Control as Inference formulation based on stochastic decision horizons, where constraint violations attenuate reward contributions and shorten the effective planning horizon via state-action-dependent continuation. This yields survival-weighted objectives that remain replay-compatible for off-policy actor-critic learning. We propose two violation semantics, absorbing and virtual termination, that share the same survival-weighted return but result in distinct optimization structures that lead to SAC/MPO-style policy improvement. Experiments demonstrate improved sample efficiency and favorable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control