Context-aware Active Multi-Step Reinforcement Learning

Gang Chen; Dingcheng Li; Ran Xu

arXiv:1911.04107·cs.LG·November 28, 2019

Context-aware Active Multi-Step Reinforcement Learning

Gang Chen, Dingcheng Li, Ran Xu

PDF

Open Access

TL;DR

This paper introduces a novel context-aware active multi-step reinforcement learning algorithm that adaptively switches backups based on context, improving off-policy learning without importance sampling.

Contribution

It proposes an innovative combination of active learning and adaptive multi-step TD with context-aware mechanisms for improved off-policy reinforcement learning.

Findings

01

Achieves competitive results on discrete and continuous tasks.

02

Effectively switches backups based on context changes.

03

Learns off-policy without importance sampling.

Abstract

Reinforcement learning has attracted great attention recently, especially policy gradient algorithms, which have been demonstrated on challenging decision making and control tasks. In this paper, we propose an active multi-step TD algorithm with adaptive stepsizes to learn actor and critic. Specifically, our model consists of two components: active stepsize learning and adaptive multi-step TD algorithm. Firstly, we divide the time horizon into chunks and actively select state and action inside each chunk. Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD( $λ$ ), but adaptively switch on/off the backups from future returns of different steps. Particularly, the adaptive multi-step TD introduces a context-aware mechanism, here a binary classifier, which decides whether or not to turn on its future backups based on the context changes. Thus,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research