Active Reinforcement Learning with Monte-Carlo Tree Search

Sebastian Schulze; Owain Evans

arXiv:1803.04926·cs.LG·March 28, 2018·6 cites

Active Reinforcement Learning with Monte-Carlo Tree Search

Sebastian Schulze, Owain Evans

PDF

Open Access

TL;DR

This paper introduces a Monte-Carlo Tree Search-based algorithm for Active Reinforcement Learning, addressing exploration challenges when reward information incurs costs, and demonstrates near-optimal performance in small to large environments.

Contribution

It presents the first asymptotically Bayes optimal ARL algorithm using Monte-Carlo Tree Search, advancing exploration strategies under reward observation costs.

Findings

01

Near-optimal performance on small Bandit problems and MDPs

02

Outperforms Q-learning with heuristics on larger MDPs

03

Identifies obstacles to scaling simulation-based ARL algorithms

Abstract

Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms

MethodsMonte-Carlo Tree Search