Memory-based Deep Reinforcement Learning for POMDPs

Lingheng Meng; Rob Gorbet; Dana Kuli\'c

arXiv:2102.12344·cs.LG·September 14, 2021·5 cites

Memory-based Deep Reinforcement Learning for POMDPs

Lingheng Meng, Rob Gorbet, Dana Kuli\'c

PDF

Open Access 1 Repo

TL;DR

This paper introduces LSTM-TD3, a memory-augmented deep reinforcement learning algorithm designed to effectively handle partially observable environments, demonstrating improved performance in scenarios with noisy or incomplete data.

Contribution

The paper proposes a novel LSTM-based extension to TD3 that incorporates memory to better address POMDP challenges, which is a significant advancement over existing DRL methods.

Findings

01

Memory component improves handling of noisy observations.

02

LSTM-TD3 outperforms other DRL algorithms in POMDPs.

03

Enhanced robustness to sensor noise and missing data.

Abstract

A promising characteristic of Deep Reinforcement Learning (DRL) is its capability to learn optimal policy in an end-to-end manner without relying on feature engineering. However, most approaches assume a fully observable state space, i.e. fully observable Markov Decision Processes (MDPs). In real-world robotics, this assumption is unpractical, because of issues such as sensor sensitivity limitations and sensor noise, and the lack of knowledge about whether the observation design is complete or not. These scenarios lead to Partially Observable MDPs (POMDPs). In this paper, we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LinghengMeng/LSTM-TD3
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fault Detection and Control Systems · Adversarial Robustness in Machine Learning

MethodsAdam · Clipped Double Q-learning · Target Policy Smoothing · Experience Replay · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Twin Delayed Deep Deterministic