Warm-Start Actor-Critic: From Approximation Error to Sub-optimality Gap
Hang Wang, Sen Lin, Junshan Zhang

TL;DR
This paper analyzes how approximation errors affect the performance and sub-optimality of Warm-Start Actor-Critic reinforcement learning, providing bounds and conditions for effective online learning acceleration.
Contribution
It offers a theoretical framework quantifying the impact of approximation errors on Warm-Start Actor-Critic algorithms and derives bounds for finite-time performance and sub-optimality gaps.
Findings
Approximation errors significantly influence learning performance.
Reducing bias is crucial for effective online learning.
Derived bounds guide the design of better warm-start RL algorithms.
Abstract
Warm-Start reinforcement learning (RL), aided by a prior policy obtained from offline training, is emerging as a promising RL approach for practical applications. Recent empirical studies have demonstrated that the performance of Warm-Start RL can be improved \textit{quickly} in some cases but become \textit{stagnant} in other cases, especially when the function approximation is used. To this end, the primary objective of this work is to build a fundamental understanding on ``\textit{whether and when online learning can be significantly accelerated by a warm-start policy from offline RL?}''. Specifically, we consider the widely used Actor-Critic (A-C) method with a prior policy. We first quantify the approximation errors in the Actor update and the Critic update, respectively. Next, we cast the Warm-Start A-C algorithm as Newton's method with perturbation, and study the impact of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
