Efficient Reinforcement Learning Using Recursive Least-Squares Methods

H. He; D. Hu; X. Xu

arXiv:1106.0707·cs.LG·June 6, 2011

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

H. He, D. Hu, X. Xu

PDF

TL;DR

This paper introduces two new reinforcement learning algorithms, RLS-TD(lambda) and Fast-AHC, which leverage recursive least-squares methods to improve convergence speed and data efficiency in online learning and control tasks.

Contribution

The paper proposes novel RLS-based RL algorithms, providing convergence proofs and demonstrating improved efficiency over existing methods in prediction and control tasks.

Findings

01

RLS-TD(lambda) converges with probability one for ergodic Markov chains.

02

Fast-AHC improves data efficiency in learning control tasks.

03

Experimental results show enhanced performance of RLS-based methods over traditional algorithms.

Abstract

The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(lambda) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(lambda) can be viewed as the extension of RLS-TD(0) from lambda=0 to general lambda within interval [0,1], so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(lambda) are proved for ergodic Markov chains. Compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.