Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
Lucas Lehnert, Michael L. Littman

TL;DR
This paper explores how successor features serve as a bridge between model-free and model-based reinforcement learning, enabling better generalization of reward predictions across different states and transitions.
Contribution
It establishes a theoretical connection between successor features and both model-based and model-free reinforcement learning, extending their applicability to variable transitions and rewards.
Findings
Successor features unify model-free and model-based RL approaches.
Predictive representations of future rewards generalize across transition and reward variations.
Successor features are equivalent to learning a policy's utility through temporal difference learning.
Abstract
A key question in reinforcement learning is how an intelligent agent can generalize knowledge across different inputs. By generalizing across different inputs, information learned for one input can be immediately reused for improving predictions for another input. Reusing information allows an agent to compute an optimal decision-making strategy using less data. State representation is a key element of the generalization process, compressing a high-dimensional input space into a low-dimensional latent state space. This article analyzes properties of different latent state spaces, leading to new connections between model-based and model-free reinforcement learning. Successor features, which predict frequencies of future observations, form a link between model-based and model-free learning: Learning to predict future expected reward outcomes, a key characteristic of model-based agents, is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Bandit Algorithms Research
