Deep Reinforcement Learning with Attention for Slate Markov Decision   Processes with High-Dimensional States and Actions

Peter Sunehag; Richard Evans; Gabriel Dulac-Arnold; Yori Zwols; Daniel; Visentin; Ben Coppin

arXiv:1512.01124·cs.AI·December 17, 2015·32 cites

Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel, Visentin, Ben Coppin

PDF

Open Access

TL;DR

This paper introduces Slate Markov Decision Processes to handle high-dimensional action spaces in reinforcement learning, applying deep Q-learning and attention mechanisms to improve decision-making in complex environments like recommendation systems.

Contribution

It proposes a novel framework for high-dimensional control called Slate-MDPs, integrating deep learning and attention to optimize combinatorial and sequential aspects of actions.

Findings

01

Successfully addressed problems with up to 2000-dimensional action spaces.

02

Demonstrated superiority over baseline agents in recommendation system environments.

03

Showed that risk-seeking strategies enhance exploration and long-term rewards.

Abstract

Many real-world problems come with action spaces represented as feature vectors. Although high-dimensional control is a largely unsolved problem, there has recently been progress for modest dimensionalities. Here we report on a successful attempt at addressing problems of dimensionality as high as $2000$ , of a particular form. Motivated by important applications such as recommendation systems that do not fit the standard reinforcement learning frameworks, we introduce Slate Markov Decision Processes (slate-MDPs). A Slate-MDP is an MDP with a combinatorial action space consisting of slates (tuples) of primitive actions of which one is executed in an underlying MDP. The agent does not control the choice of this executed action and the action might not even be from the slate, e.g., for recommendation systems for which all recommendations can be ignored. We use deep Q-learning based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research

MethodsQ-Learning