Learning Action Representations for Reinforcement Learning

Yash Chandak; Georgios Theocharous; James Kostas; Scott Jordan; Philip; S. Thomas

arXiv:1902.00183·cs.LG·May 16, 2019·22 cites

Learning Action Representations for Reinforcement Learning

Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, Philip, S. Thomas

PDF

Open Access

TL;DR

This paper introduces a method to learn low-dimensional action representations in reinforcement learning, improving generalization over large action spaces by inferring outcomes of similar actions.

Contribution

It proposes a novel algorithm to learn and utilize action representations, enhancing policy generalization in large action spaces.

Findings

01

Improved performance on large-scale real-world problems

02

Effective inference of outcomes for similar actions

03

Convergence conditions for the proposed algorithm

Abstract

Most model-free reinforcement learning methods leverage state representations (embeddings) for generalization, but either ignore structure in the space of actions or assume the structure is provided a priori. We show how a policy can be decomposed into a component that acts in a low-dimensional space of action representations and a component that transforms these representations into actual actions. These representations improve generalization over large, finite action sets by allowing the agent to infer the outcomes of actions similar to actions already taken. We provide an algorithm to both learn and use action representations and provide conditions for its convergence. The efficacy of the proposed method is demonstrated on large-scale real-world problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control