STEERING: Stein Information Directed Exploration for Model-Based   Reinforcement Learning

Souradip Chakraborty; Amrit Singh Bedi; Alec Koppel; Mengdi Wang,; Furong Huang; Dinesh Manocha

arXiv:2301.12038·cs.LG·September 20, 2023·1 cites

STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang,, Furong Huang, Dinesh Manocha

PDF

Open Access

TL;DR

This paper introduces STEIN, a novel exploration method for model-based reinforcement learning that uses kernelized Stein discrepancy to efficiently estimate information gain, achieving better regret bounds and practical performance.

Contribution

The paper proposes a new exploration incentive based on IPM and KSD, along with a novel algorithm STEIN, improving computational efficiency and regret bounds in model-based RL.

Findings

01

Achieves sublinear Bayesian regret with STEIN.

02

Outperforms prior methods in experiments.

03

Computationally affordable exploration strategy.

Abstract

Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo: \textbf{STE}in information dir\textbf{E}cted exploration for model-based \textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference