Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Philips George John; Arnab Bhattacharyya; Silviu Maniu; Dimitrios; Myrisiotis; Zhenan Wu

arXiv:2411.10906·cs.LG·November 19, 2024

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios, Myrisiotis, Zhenan Wu

PDF

Open Access

TL;DR

This paper introduces modified online reinforcement learning algorithms for linear MDPs that reduce space and time complexity while maintaining low regret, validated through experiments on synthetic and real data.

Contribution

It proposes two variants of LSVI-UCB that alternate learning periods to improve efficiency without sacrificing regret guarantees.

Findings

01

Achieve low space and time complexity in experiments

02

Maintain sublinear regret with the modifications

03

Perform well on both synthetic and real-world benchmarks

Abstract

Reinforcement learning algorithms are usually stated without theoretical guarantees regarding their performance. Recently, Jin, Yang, Wang, and Jordan (COLT 2020) showed a polynomial-time reinforcement learning algorithm (namely, LSVI-UCB) for the setting of linear Markov decision processes, and provided theoretical guarantees regarding its running time and regret. In real-world scenarios, however, the space usage of this algorithm can be prohibitive due to a utilized linear regression step. We propose and analyze two modifications of LSVI-UCB, which alternate periods of learning and not-learning, to reduce space and time usage while maintaining sublinear regret. We show experimentally, on synthetic data and real-world benchmarks, that our algorithms achieve low space usage and running time, while not significantly sacrificing regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Adaptive Dynamic Programming Control

MethodsLinear Regression