Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Hang Wang; Sen Lin; Junshan Zhang

arXiv:2306.11271·cs.LG·June 21, 2023·1 cites

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap

Hang Wang, Sen Lin, Junshan Zhang

PDF

Open Access 1 Video

TL;DR

This paper analyzes how approximation errors affect the performance and sub-optimality of Warm-Start Actor-Critic reinforcement learning, providing bounds and conditions for effective online learning acceleration.

Contribution

It offers a theoretical framework quantifying the impact of approximation errors on Warm-Start Actor-Critic algorithms and derives bounds for finite-time performance and sub-optimality gaps.

Findings

01

Approximation errors significantly influence learning performance.

02

Reducing bias is crucial for effective online learning.

03

Derived bounds guide the design of better warm-start RL algorithms.

Abstract

Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved \textit{quickly} in some cases but become \textit{stagnant} in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ``\textit{whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?}''. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms