STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang,, Furong Huang, Dinesh Manocha

TL;DR
This paper introduces STEIN, a novel exploration method for model-based reinforcement learning that uses kernelized Stein discrepancy to efficiently estimate information gain, achieving better regret bounds and practical performance.
Contribution
The paper proposes a new exploration incentive based on IPM and KSD, along with a novel algorithm STEIN, improving computational efficiency and regret bounds in model-based RL.
Findings
Achieves sublinear Bayesian regret with STEIN.
Outperforms prior methods in experiments.
Computationally affordable exploration strategy.
Abstract
Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo: \textbf{STE}in information dir\textbf{E}cted exploration for model-based \textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
