Warm-up Free Policy Optimization: Improved Regret in Linear Markov   Decision Processes

Asaf Cassel; Aviv Rosenberg

arXiv:2407.03065·cs.LG·July 4, 2024

Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes

Asaf Cassel, Aviv Rosenberg

PDF

Open Access

TL;DR

This paper introduces a warm-up free policy optimization algorithm for linear MDPs that achieves rate-optimal regret, improving practical implementation and parameter dependence in both adversarial and stochastic settings.

Contribution

It replaces the costly warm-up phase with a simple contraction mechanism, achieving optimal regret with better parameter dependence.

Findings

01

Achieves rate-optimal regret in linear MDPs.

02

Eliminates the warm-up phase for practical efficiency.

03

Improves dependence on horizon and dimension parameters.

Abstract

Policy Optimization (PO) methods are among the most popular Reinforcement Learning (RL) algorithms in practice. Recently, Sherman et al. [2023a] proposed a PO-based algorithm with rate-optimal regret guarantees under the linear Markov Decision Process (MDP) model. However, their algorithm relies on a costly pure exploration warm-up phase that is hard to implement in practice. This paper eliminates this undesired warm-up phase, replacing it with a simple and efficient contraction mechanism. Our PO algorithm achieves rate-optimal regret with improved dependence on the other parameters of the problem (horizon and function approximation dimension) in two fundamental settings: adversarial losses with full-information feedback and stochastic losses with bandit feedback.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts · Energy, Environment, and Transportation Policies

MethodsParrot optimizer: Algorithm and applications to medical problems