Reinforcement Learning via Conservative Agent for Environments with Random Delays

Jongsoo Lee; Jangwon Kim; Jiseok Jeong; and Soohee Han

arXiv:2507.18992·cs.LG·February 3, 2026

Reinforcement Learning via Conservative Agent for Environments with Random Delays

Jongsoo Lee, Jangwon Kim, Jiseok Jeong, and Soohee Han

PDF

Open Access

TL;DR

This paper introduces a conservative agent approach that transforms environments with random delays into constant-delay equivalents, enabling existing algorithms to perform effectively despite unpredictable feedback delays.

Contribution

The paper presents a novel conservative agent method that reformulates random-delay environments into constant-delay ones, allowing seamless extension of existing algorithms.

Findings

01

Significantly outperforms baseline algorithms in continuous control tasks.

02

Improves asymptotic performance and sample efficiency.

03

Effective in environments with stochastic feedback delays.

Abstract

Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Adaptive Dynamic Programming Control