Regret Minimization for Partially Observable Deep Reinforcement Learning

Peter Jin; Kurt Keutzer; Sergey Levine

arXiv:1710.11424·cs.LG·October 26, 2018·21 cites

Regret Minimization for Partially Observable Deep Reinforcement Learning

Peter Jin, Kurt Keutzer, Sergey Levine

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel deep reinforcement learning algorithm based on counterfactual regret minimization, which effectively handles partial observability and outperforms existing methods in complex 3D navigation and object interaction tasks.

Contribution

The paper presents a new regret minimization-based deep RL algorithm that is robust to partial observability, improving performance over baseline methods in challenging environments.

Findings

01

Outperforms baseline methods in 3D navigation tasks

02

Effective in partially observed object interaction scenarios

03

Demonstrates robustness to partial observability

Abstract

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peterhj/arm-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Control Systems Optimization