Data-Efficient Reinforcement Learning for Malaria Control
Lixin Zou, Long Xia, Linfang Hou, Xiangyu Zhao, and Dawei Yin

TL;DR
This paper presents VB-MCTS, a data-efficient, model-based reinforcement learning method using Gaussian Processes and variance-bonus rewards, enabling effective malaria control policies with minimal data and trials.
Contribution
Introduction of VB-MCTS, a novel, sample-efficient reinforcement learning approach combining Gaussian Process models and variance-based exploration for complex, cost-sensitive tasks like malaria control.
Findings
VB-MCTS outperforms state-of-the-art methods on malaria control tasks.
The method demonstrates high data efficiency with few trials.
Experimental results show superior performance in a competitive RL environment.
Abstract
Sequential decision-making under cost-sensitive tasks is prohibitively daunting, especially for the problem that has a significant impact on people's daily lives, such as malaria control, treatment recommendation. The main challenge faced by policymakers is to learn a policy from scratch by interacting with a complex environment in a few trials. This work introduces a practical, data-efficient policy learning method, named Variance-Bonus Monte Carlo Tree Search~(VB-MCTS), which can copy with very little data and facilitate learning from scratch in only a few trials. Specifically, the solution is a model-based reinforcement learning method. To avoid model bias, we apply Gaussian Process~(GP) regression to estimate the transitions explicitly. With the GP world model, we propose a variance-bonus reward to measure the uncertainty about the world. Adding the reward to the planning with MCTS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
