Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning

Sihan Zeng; Thinh T. Doan

arXiv:2405.09660·math.OC·January 21, 2026·2 cites

Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning

Sihan Zeng, Thinh T. Doan

PDF

Open Access

TL;DR

This paper introduces a novel two-time-scale stochastic gradient method with averaging steps that accelerates convergence in reinforcement learning applications, outperforming existing algorithms both theoretically and empirically.

Contribution

The paper proposes a new two-time-scale stochastic gradient algorithm with averaging, achieving faster convergence rates and improved performance in reinforcement learning tasks.

Findings

01

Faster convergence rates under various objective conditions.

02

Outperforms existing two-time-scale algorithms in RL simulations.

03

Specializes to new online RL methods with superior results.

Abstract

Two-time-scale optimization is a framework introduced in Zeng et al. (2024) that abstracts a range of policy evaluation and policy optimization problems in reinforcement learning (RL). Akin to bi-level optimization under a particular type of stochastic oracle, the two-time-scale optimization framework has an upper level objective whose gradient evaluation depends on the solution of a lower level problem, which is to find the root of a strongly monotone operator. In this work, we propose a new method for solving two-time-scale optimization that achieves significantly faster convergence than the prior arts. The key idea of our approach is to leverage an averaging step to improve the estimates of the operators in both lower and upper levels before using them to update the decision variables. These additional averaging steps eliminate the direct coupling between the main variables, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM