Delayed homomorphic reinforcement learning for environments with delayed feedback
Jongsoo Lee, Jangwon Kim, Soohee Han

TL;DR
This paper introduces a novel reinforcement learning framework, DHRL, that effectively handles delayed feedback by leveraging MDP homomorphisms to create structured abstractions, improving learning efficiency in complex environments.
Contribution
The paper proposes DHRL, a new framework using MDP homomorphisms for abstraction in delayed feedback environments, and introduces D$^2$HPG, a deep actor-critic method for continuous domains.
Findings
D$^2$HPG outperforms augmentation-based baselines in MuJoCo tasks.
Exact abstraction preserves optimality in finite domains.
Approximate abstraction provides a value-loss bound under stochastic dynamics.
Abstract
Reinforcement learning in real-world systems often involves delayed feedback, which breaks the Markov assumption and impedes both learning and control. Canonical augmentation-based approaches cause state-space explosion, which imposes a severe sample-complexity burden. Despite recent progress, state-of-the-art augmentation-based baselines either mainly alleviate the burden on the critic or rely on non-unified treatments for the actor and critic. In this study, we propose delayed homomorphic reinforcement learning (DHRL), a framework grounded in MDP homomorphisms that defines a belief-equivalence relation over the augmented state space to collapse control-redundant augmented states. In principle, this yields exact abstraction under deterministic dynamics and approximate abstraction under stochastic dynamics, enabling both the actor and critic to benefit from a structured abstraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
