Kalman Temporal Differences
Matthieu Geist, Olivier Pietquin

TL;DR
The paper introduces Kalman Temporal Differences (KTD), a novel reinforcement learning framework that improves sample efficiency, handles non-linearities and non-stationarity, and manages uncertainty, with algorithms tested on benchmark problems.
Contribution
It presents the KTD framework and its extension XKTD for stochastic MDPs, offering a new approximation scheme with proven convergence and superior performance.
Findings
KTD demonstrates high sample efficiency and effective non-linear approximation.
XKTD successfully handles stochastic MDPs with favorable benchmark results.
The algorithms outperform existing methods while managing uncertainty and non-stationarity.
Abstract
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInertial Sensor and Navigation · Simulation Techniques and Applications · Human-Automation Interaction and Safety
