Finite Horizon Q-learning: Stability, Convergence, Simulations and an   application on Smart Grids

Vivek VP; Dr.Shalabh Bhatnagar

arXiv:2110.15093·cs.LG·August 9, 2022·5 cites

Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids

Vivek VP, Dr.Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper develops a finite horizon Q-learning algorithm, providing stability and convergence proofs using O.D.E methods, and demonstrates its effectiveness on random MDPs and smart grid applications.

Contribution

It introduces a novel finite horizon Q-learning algorithm with rigorous stability and convergence analysis, filling a gap in existing reinforcement learning research.

Findings

01

Proves stability and convergence of finite horizon Q-learning.

02

Shows effective performance on random MDPs.

03

Demonstrates application to smart grid management.

Abstract

Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of its stability and convergence. Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method. We also demonstrate the performance of our algorithm on a setting of random MDP as well as on an application on smart grids.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Smart Grid Security and Resilience · Distributed Sensor Networks and Detection Algorithms

MethodsQ-Learning