Uniformly Conservative Exploration in Reinforcement Learning

Wanqiao Xu; Jason Yecheng Ma; Kan Xu; Hamsa Bastani; Osbert Bastani

arXiv:2110.13060·cs.LG·February 27, 2023

Uniformly Conservative Exploration in Reinforcement Learning

Wanqiao Xu, Jason Yecheng Ma, Kan Xu, Hamsa Bastani, Osbert Bastani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a conservative exploration method in reinforcement learning that ensures safety by outperforming a baseline policy within an exploration budget, using adaptive exploration strategies.

Contribution

It presents a novel algorithm combining UCB exploration with adaptive constraints to ensure safety and conservativeness in RL, applicable to both tabular and continuous state spaces.

Findings

01

The algorithm achieves low regret while maintaining safety in tabular RL.

02

Experimental results show effective learning in healthcare tasks like sepsis and HIV treatments.

03

The approach extends to deep RL for continuous state spaces.

Abstract

A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes. We propose a natural constraint on exploration -- \textit{uniformly} outperforming a conservative policy (adaptively estimated from all data observed thus far), up to a per-episode exploration budget. We design a novel algorithm that uses a UCB reinforcement learning policy for exploration, but overrides it as needed to satisfy our exploration constraint with high probability. Importantly, to ensure unbiased exploration across the state space, our algorithm adaptively determines when to explore. We prove that our approach remains conservative while minimizing regret in the tabular setting. We experimentally validate our results on a sepsis treatment task and an HIV treatment task, demonstrating that our algorithm can learn while ensuring good performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yifan123/arxiv_spider
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning