Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes
Qin Lu, Georgios B. Giannakis

TL;DR
This paper introduces scalable, robust, and adaptive Gaussian process-based temporal-difference learning methods for policy evaluation in reinforcement learning, capable of handling adversarial settings and selecting kernels dynamically.
Contribution
It develops the OS-GPTD and OS-EGPTD algorithms that improve value function estimation with online scalability, robustness, and kernel ensemble adaptation.
Findings
OS-GPTD performs well in large state spaces.
OS-EGPTD adaptively selects kernels for better accuracy.
Both methods outperform fixed kernel approaches in benchmarks.
Abstract
Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed {OS-GPTD}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms
MethodsGaussian Process
