Kalman Temporal Differences

Matthieu Geist; Olivier Pietquin

arXiv:1406.3270·cs.LG·June 13, 2014

Kalman Temporal Differences

Matthieu Geist, Olivier Pietquin

PDF

Open Access

TL;DR

The paper introduces Kalman Temporal Differences (KTD), a novel reinforcement learning framework that improves sample efficiency, handles non-linearities and non-stationarity, and manages uncertainty, with algorithms tested on benchmark problems.

Contribution

It presents the KTD framework and its extension XKTD for stochastic MDPs, offering a new approximation scheme with proven convergence and superior performance.

Findings

01

KTD demonstrates high sample efficiency and effective non-linear approximation.

02

XKTD successfully handles stochastic MDPs with favorable benchmark results.

03

The algorithms outperform existing methods while managing uncertainty and non-stationarity.

Abstract

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInertial Sensor and Navigation · Simulation Techniques and Applications · Human-Automation Interaction and Safety