The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu; Shuhang Chen; Shangtong Zhang

arXiv:2401.07844·cs.LG·November 6, 2025·1 cites

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang

PDF

Open Access

TL;DR

This paper extends the Borkar-Meyn stability theorem to Markovian noise, enhancing the theoretical foundation for analyzing stochastic approximation algorithms in reinforcement learning, especially off-policy methods with function approximation.

Contribution

It generalizes the stability analysis from Martingale difference noise to Markovian noise, broadening applicability in reinforcement learning algorithms.

Findings

01

Extended stability theorem to Markovian noise setting

02

Applicable to off-policy reinforcement learning algorithms

03

Provides conditions for boundedness of stochastic iterates

Abstract

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of the strong law of large numbers and a form of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Electric Vehicles and Infrastructure