Active Reinforcement Learning with Monte-Carlo Tree Search
Sebastian Schulze, Owain Evans

TL;DR
This paper introduces a Monte-Carlo Tree Search-based algorithm for Active Reinforcement Learning, addressing exploration challenges when reward information incurs costs, and demonstrates near-optimal performance in small to large environments.
Contribution
It presents the first asymptotically Bayes optimal ARL algorithm using Monte-Carlo Tree Search, advancing exploration strategies under reward observation costs.
Findings
Near-optimal performance on small Bandit problems and MDPs
Outperforms Q-learning with heuristics on larger MDPs
Identifies obstacles to scaling simulation-based ARL algorithms
Abstract
Active Reinforcement Learning (ARL) is a twist on RL where the agent observes reward information only if it pays a cost. This subtle change makes exploration substantially more challenging. Powerful principles in RL like optimism, Thompson sampling, and random exploration do not help with ARL. We relate ARL in tabular environments to Bayes-Adaptive MDPs. We provide an ARL algorithm using Monte-Carlo Tree Search that is asymptotically Bayes optimal. Experimentally, this algorithm is near-optimal on small Bandit problems and MDPs. On larger MDPs it outperforms a Q-learner augmented with specialised heuristics for ARL. By analysing exploration behaviour in detail, we uncover obstacles to scaling up simulation-based algorithms for ARL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms
MethodsMonte-Carlo Tree Search
