Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids
Vivek VP, Dr.Shalabh Bhatnagar

TL;DR
This paper develops a finite horizon Q-learning algorithm, providing stability and convergence proofs using O.D.E methods, and demonstrates its effectiveness on random MDPs and smart grid applications.
Contribution
It introduces a novel finite horizon Q-learning algorithm with rigorous stability and convergence analysis, filling a gap in existing reinforcement learning research.
Findings
Proves stability and convergence of finite horizon Q-learning.
Shows effective performance on random MDPs.
Demonstrates application to smart grid management.
Abstract
Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of its stability and convergence. Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method. We also demonstrate the performance of our algorithm on a setting of random MDP as well as on an application on smart grids.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Smart Grid Security and Resilience · Distributed Sensor Networks and Detection Algorithms
MethodsQ-Learning
