Deep Successor Reinforcement Learning

Tejas D. Kulkarni; Ardavan Saeedi; Simanta Gautam; Samuel J. Gershman

arXiv:1606.02396·stat.ML·June 9, 2016·97 cites

Deep Successor Reinforcement Learning

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman

PDF

Open Access 1 Repo

TL;DR

This paper introduces Deep Successor Reinforcement Learning (DSR), a method that combines successor representations with deep learning to improve value function learning from raw observations, enabling better reward sensitivity and subgoal extraction.

Contribution

The paper presents DSR, a novel deep RL framework that integrates successor representations, enhancing reward sensitivity and subgoal identification from raw pixel inputs.

Findings

01

Effective in grid-world and Doom environments

02

Improves reward change sensitivity

03

Extracts subgoals from successor maps

Abstract

Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ardavans/DSR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function