More Efficient Randomized Exploration for Reinforcement Learning via   Approximate Sampling

Haque Ishfaq; Yixin Tan; Yu Yang; Qingfeng Lan; Jianfeng Lu; A. Rupam; Mahmood; Doina Precup; Pan Xu

arXiv:2406.12241·cs.LG·June 19, 2024·1 cites

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam, Mahmood, Doina Precup, Pan Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible framework combining approximate sampling methods with Feel-Good Thompson Sampling to improve exploration in reinforcement learning, achieving better regret bounds and empirical performance in deep RL tasks.

Contribution

It develops a novel algorithmic framework that integrates various approximate sampling techniques with FGTS, enhancing exploration efficiency and theoretical guarantees in RL.

Findings

01

Achieves the best known regret dependency on dimensionality for linear MDPs.

02

Provides explicit sampling complexity for each sampler used.

03

Demonstrates superior empirical performance on Atari games.

Abstract

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

panxulab/lsvi-ase
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Energy Efficient Wireless Sensor Networks · Distributed Sensor Networks and Detection Algorithms

MethodsSpatio-temporal stability analysis