Context-aware Active Multi-Step Reinforcement Learning
Gang Chen, Dingcheng Li, Ran Xu

TL;DR
This paper introduces a novel context-aware active multi-step reinforcement learning algorithm that adaptively switches backups based on context, improving off-policy learning without importance sampling.
Contribution
It proposes an innovative combination of active learning and adaptive multi-step TD with context-aware mechanisms for improved off-policy reinforcement learning.
Findings
Achieves competitive results on discrete and continuous tasks.
Effectively switches backups based on context changes.
Learns off-policy without importance sampling.
Abstract
Reinforcement learning has attracted great attention recently, especially policy gradient algorithms, which have been demonstrated on challenging decision making and control tasks. In this paper, we propose an active multi-step TD algorithm with adaptive stepsizes to learn actor and critic. Specifically, our model consists of two components: active stepsize learning and adaptive multi-step TD algorithm. Firstly, we divide the time horizon into chunks and actively select state and action inside each chunk. Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD(), but adaptively switch on/off the backups from future returns of different steps. Particularly, the adaptive multi-step TD introduces a context-aware mechanism, here a binary classifier, which decides whether or not to turn on its future backups based on the context changes. Thus,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research
