Parameterized Indexed Value Function for Efficient Exploration in   Reinforcement Learning

Tian Tan; Zhihan Xiong; Vikranth R. Dwaracherla

arXiv:1912.10577·cs.LG·March 23, 2020

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla

PDF

Open Access 1 Repo

TL;DR

This paper introduces Parameterized Indexed Networks (PINs), a computationally efficient method for exploration in reinforcement learning using index sampling, with theoretical regret bounds and empirical validation.

Contribution

It proposes a novel dual-network architecture for indexed value functions, improving exploration efficiency and reducing computational costs in reinforcement learning.

Findings

01

PINs achieve competitive exploration performance.

02

Theoretical regret bounds are established for the method.

03

Empirical results demonstrate PINs' effectiveness in experiments.

Abstract

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiantan522/PINs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms