Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes

Ashok Arora; Neetesh Kumar

arXiv:2505.11153·cs.LG·May 19, 2025

Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes

Ashok Arora, Neetesh Kumar

PDF

Open Access

TL;DR

This paper introduces a bi-recurrent transformer architecture that enhances sample efficiency and reduces parameters for reinforcement learning in partially observable environments, outperforming existing methods across multiple POMDP benchmarks.

Contribution

The paper proposes a novel bi-recurrent model architecture that improves sample efficiency and reduces parameters in POMDPs, addressing limitations of existing transformer-based RL models.

Findings

01

Outperforms existing methods by 87.39% to 482.04% on average across 23 POMDP environments.

02

Reduces model parameter count compared to traditional transformer models.

03

Enhances the ability to handle partial observability and sequential dependencies.

Abstract

In real-world reinforcement learning (RL) scenarios, agents often encounter partial observability, where incomplete or noisy information obscures the true state of the environment. Partially Observable Markov Decision Processes (POMDPs) are commonly used to model these environments, but effective performance requires memory mechanisms to utilise past observations. While recurrence networks have traditionally addressed this need, transformer-based models have recently shown improved sample efficiency in RL tasks. However, their application to POMDPs remains underdeveloped, and their real-world deployment is constrained due to the high parameter count. This work introduces a novel bi-recurrent model architecture that improves sample efficiency and reduces model parameter count in POMDP scenarios. The architecture replaces the multiple feed forward layers with a single layer of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Explainable Artificial Intelligence (XAI)