Learning to Plan via a Multi-Step Policy Regression Method

Stefan Wagner; Michael Janschek; Tobias Uelwer; Stefan; Harmeling

arXiv:2106.10075·cs.LG·June 21, 2021

Learning to Plan via a Multi-Step Policy Regression Method

Stefan Wagner, Michael Janschek, Tobias Uelwer, Stefan, Harmeling

PDF

TL;DR

This paper introduces Policy Horizon Regression (PHR), a method that predicts multiple actions in advance to improve inference speed in environments requiring sequential decision-making, demonstrated on MiniGrid and Pong.

Contribution

The paper presents a novel multi-step policy prediction method called PHR that enhances inference efficiency by predicting action sequences in advance.

Findings

01

Significant speedup in inference time on MiniGrid and Pong environments.

02

Successful prediction of multiple sequential actions from a single observation.

03

Effective policy distillation for multi-step action prediction.

Abstract

We propose a new approach to increase inference performance in environments that require a specific sequence of actions in order to be solved. This is for example the case for maze environments where ideally an optimal path is determined. Instead of learning a policy for a single step, we want to learn a policy that can predict n actions in advance. Our proposed method called policy horizon regression (PHR) uses knowledge of the environment sampled by A2C to learn an n dimensional policy vector in a policy distillation setup which yields n sequential actions per observation. We test our method on the MiniGrid and Pong environments and show drastic speedup during inference time by successfully predicting sequences of actions on a single observation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsA2C