Value Enhancement of Reinforcement Learning via Efficient and Robust   Trust Region Optimization

Chengchun Shi; Zhengling Qi; Jianing Wang; Fan Zhou

arXiv:2301.02220·stat.ML·January 6, 2023

Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization

Chengchun Shi, Zhengling Qi, Jianing Wang, Fan Zhou

PDF

Open Access

TL;DR

This paper introduces a novel value enhancement method for offline reinforcement learning that improves policy performance and convergence rates, especially in data-limited high-stakes domains, using a generalizable approach applicable to neural network policies.

Contribution

The paper proposes a new value enhancement technique for offline RL that guarantees non-worse policies and accelerates convergence to optimality under mild conditions.

Findings

01

Method improves policy value compared to initial policies.

02

Accelerates convergence to optimal policy under mild conditions.

03

Demonstrates superior performance in extensive numerical studies.

Abstract

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in \textit{online} settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with limited and pre-collected data, in this paper, we study \textit{offline} reinforcement learning methods. To efficiently use these datasets for policy optimization, we propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms. Specifically, when the initial policy is not consistent, our method will output a policy whose value is no worse and often better than that of the initial policy. When the initial policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization