A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance   Optimization in a Discounted MDP

Tejaram Sangadi; L. A. Prashanth; Krishna Jagannathan

arXiv:2406.07892·cs.LG·March 13, 2025

A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP

Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

PDF

Open Access

TL;DR

This paper provides finite-sample theoretical guarantees for a risk-sensitive actor-critic algorithm in reinforcement learning, analyzing convergence rates and bounds for mean-variance optimization in discounted MDPs.

Contribution

It introduces finite-sample bounds for a TD learning algorithm with linear function approximation and integrates SPSA-based actor updates, advancing understanding of risk-sensitive reinforcement learning methods.

Findings

01

Finite-sample bounds with exponential decay on initial error.

02

Convergence rate of O(1/t) for the TD algorithm.

03

O(n^{-1/4}) convergence guarantee for the actor-critic method.

Abstract

Motivated by applications in risk-sensitive reinforcement learning, we study mean-variance optimization in a discounted reward Markov Decision Process (MDP). Specifically, we analyze a Temporal Difference (TD) learning algorithm with linear function approximation (LFA) for policy evaluation. We derive finite-sample bounds that hold (i) in the mean-squared sense and (ii) with high probability under tail iterate averaging, both with and without regularization. Our bounds exhibit an exponentially decaying dependence on the initial error and a convergence rate of $O (1/ t)$ after $t$ iterations. Moreover, for the regularized TD variant, our bound holds for a universal step size. Next, we integrate a Simultaneous Perturbation Stochastic Approximation (SPSA)-based actor update with an LFA critic and establish an $O (n^{- 1/4})$ convergence guarantee, where $n$ denotes the iterations of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer-related molecular mechanisms research

MethodsExponential Decay