Q-learning as a monotone scheme

Lingyi Yang

arXiv:2405.20538·cs.LG·June 3, 2024

Q-learning as a monotone scheme

Lingyi Yang

PDF

Open Access

TL;DR

This paper investigates stability and convergence issues in reinforcement learning by analyzing a linear quadratic example, interpreting Q-learning convergence through monotone schemes, and discussing the impact of function approximation.

Contribution

It introduces a monotone scheme perspective to understand Q-learning convergence and explores how function approximation affects stability.

Findings

01

Q-learning convergence can be viewed as a monotone scheme.

02

Function approximation influences the monotonicity and stability of Q-learning.

03

Insights into stability issues in deep reinforcement learning.

Abstract

Stability issues with reinforcement learning methods persist. To better understand some of these stability and convergence issues involving deep reinforcement learning methods, we examine a simple linear quadratic example. We interpret the convergence criterion of exact Q-learning in the sense of a monotone scheme and discuss consequences of function approximation on monotonicity properties.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition

MethodsQ-Learning