Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Saki Omi; Hyo-Sang Shin; Namhoon Cho; Antonios Tsourdos

arXiv:2307.15931·cs.LG·March 4, 2026

Dynamic Deep-Reinforcement-Learning Algorithm in Partially Observable Markov Decision Processes

Saki Omi, Hyo-Sang Shin, Namhoon Cho, Antonios Tsourdos

PDF

Open Access

TL;DR

This paper introduces three novel deep reinforcement learning algorithms using LSTM networks to better handle partially observable environments by incorporating action information, demonstrating improved performance and computational efficiency.

Contribution

It investigates the impact of including action data and network architecture choices in RNN-based RL, proposing three new algorithms with improved efficiency and effectiveness.

Findings

01

Inclusion of action trajectories improves learning performance.

02

H-TD3 reduces computational time while maintaining performance.

03

LSTM architectures effectively summarize hidden state trajectories.

Abstract

Recent studies have greatly improved reinforcement learning, and an increased interest in real-world implementation has emerged. In many cases, the implementation is challenged by time-varying disturbances as it introduces hidden states, which makes the problem best described with Partially Observable Markov Decision Processes. An effective approach to address this problem is to introduce a Recurrent Neural Network (RNN) in place of a state estimator. However, only a few studies have investigated the types of information to be supplied to the RNN and the network architecture to handle them. This study discusses the effectiveness of the inclusion of action along with observation and the impact of network architecture to handle them by providing interpretations of how the trajectories are summarized at LSTM networks. Specifically, three novel approaches with different architectures are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Anomaly Detection Techniques and Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory