Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler,, Furong Huang, Pratap Tokekar, Dinesh Manocha

TL;DR
This paper introduces a novel model-based reinforcement learning method that uses kernelized Stein discrepancy for efficient posterior estimation, enabling scalable training with theoretical guarantees and significant computational savings.
Contribution
It relaxes assumptions on transition models, incorporates a Bayesian coreset for compression, and achieves sublinear Bayesian regret in large-scale RL.
Findings
Achieves up to 50% reduction in wall clock time.
Performs competitively with state-of-the-art RL methods.
Handles generic mixture models for transition dynamics.
Abstract
Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt an approach based upon Stein's method, which, under a smoothness condition on the constructed posterior and target, allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
