Robust and Adaptive Temporal-Difference Learning Using An Ensemble of   Gaussian Processes

Qin Lu; Georgios B. Giannakis

arXiv:2112.00882·stat.ML·December 3, 2021·1 cites

Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes

Qin Lu, Georgios B. Giannakis

PDF

Open Access

TL;DR

This paper introduces scalable, robust, and adaptive Gaussian process-based temporal-difference learning methods for policy evaluation in reinforcement learning, capable of handling adversarial settings and selecting kernels dynamically.

Contribution

It develops the OS-GPTD and OS-EGPTD algorithms that improve value function estimation with online scalability, robustness, and kernel ensemble adaptation.

Findings

01

OS-GPTD performs well in large state spaces.

02

OS-EGPTD adaptively selects kernels for better accuracy.

03

Both methods outperform fixed kernel approaches in benchmarks.

Abstract

Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed {OS-GPTD}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms

MethodsGaussian Process