Delayed homomorphic reinforcement learning for environments with delayed feedback

Jongsoo Lee; Jangwon Kim; Soohee Han

arXiv:2604.03641·cs.LG·May 5, 2026

Delayed homomorphic reinforcement learning for environments with delayed feedback

Jongsoo Lee, Jangwon Kim, Soohee Han

PDF

TL;DR

This paper introduces a novel reinforcement learning framework, DHRL, that effectively handles delayed feedback by leveraging MDP homomorphisms to create structured abstractions, improving learning efficiency in complex environments.

Contribution

The paper proposes DHRL, a new framework using MDP homomorphisms for abstraction in delayed feedback environments, and introduces D$^2$HPG, a deep actor-critic method for continuous domains.

Findings

01

D$^2$HPG outperforms augmentation-based baselines in MuJoCo tasks.

02

Exact abstraction preserves optimality in finite domains.

03

Approximate abstraction provides a value-loss bound under stochastic dynamics.

Abstract

Reinforcement learning in real-world systems often involves delayed feedback, which breaks the Markov assumption and impedes both learning and control. Canonical augmentation-based approaches cause state-space explosion, which imposes a severe sample-complexity burden. Despite recent progress, state-of-the-art augmentation-based baselines either mainly alleviate the burden on the critic or rely on non-unified treatments for the actor and critic. In this study, we propose delayed homomorphic reinforcement learning (DHRL), a framework grounded in MDP homomorphisms that defines a belief-equivalence relation over the augmented state space to collapse control-redundant augmented states. In principle, this yields exact abstraction under deterministic dynamics and approximate abstraction under stochastic dynamics, enabling both the actor and critic to benefit from a structured abstraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.