Posterior Coreset Construction with Kernelized Stein Discrepancy for   Model-Based Reinforcement Learning

Souradip Chakraborty; Amrit Singh Bedi; Alec Koppel; Brian M. Sadler,; Furong Huang; Pratap Tokekar; Dinesh Manocha

arXiv:2206.01162·cs.LG·May 5, 2023

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler,, Furong Huang, Pratap Tokekar, Dinesh Manocha

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel model-based reinforcement learning method that uses kernelized Stein discrepancy for efficient posterior estimation, enabling scalable training with theoretical guarantees and significant computational savings.

Contribution

It relaxes assumptions on transition models, incorporates a Bayesian coreset for compression, and achieves sublinear Bayesian regret in large-scale RL.

Findings

01

Achieves up to 50% reduction in wall clock time.

02

Performs competitively with state-of-the-art RL methods.

03

Handles generic mixture models for transition dynamics.

Abstract

Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt an approach based upon Stein's method, which, under a smoothness condition on the constructed posterior and target, allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning· underline

Taxonomy

TopicsModel Reduction and Neural Networks · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research