Reinforcement Learning via Conservative Agent for Environments with Random Delays
Jongsoo Lee, Jangwon Kim, Jiseok Jeong, and Soohee Han

TL;DR
This paper introduces a conservative agent approach that transforms environments with random delays into constant-delay equivalents, enabling existing algorithms to perform effectively despite unpredictable feedback delays.
Contribution
The paper presents a novel conservative agent method that reformulates random-delay environments into constant-delay ones, allowing seamless extension of existing algorithms.
Findings
Significantly outperforms baseline algorithms in continuous control tasks.
Improves asymptotic performance and sample efficiency.
Effective in environments with stochastic feedback delays.
Abstract
Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Adaptive Dynamic Programming Control
