Knowledge is reward: Learning optimal exploration by predictive reward   cashing

Luca Ambrogioni

arXiv:2109.08518·stat.ML·September 20, 2021

Knowledge is reward: Learning optimal exploration by predictive reward cashing

Luca Ambrogioni

PDF

Open Access

TL;DR

This paper introduces a novel approach to optimal exploration in reinforcement learning by leveraging the concept of predictive reward cashing, simplifying the Bayes-adaptive exploration problem and enabling effective learning without heuristics.

Contribution

The paper proposes a new mathematical structure for Bayes-adaptive exploration using cross-value, which simplifies the problem and improves learning efficiency.

Findings

01

Enables learning complex information gathering tasks without shaping or heuristics.

02

Dramatically simplifies the computational complexity of Bayes-adaptive exploration.

03

Shows improved performance over standard RL algorithms in experiments.

Abstract

There is a strong link between the general concept of intelligence and the ability to collect and use information. The theory of Bayes-adaptive exploration offers an attractive optimality framework for training machines to perform complex information gathering tasks. However, the computational complexity of the resulting optimal control problem has limited the diffusion of the theory to mainstream deep AI research. In this paper we exploit the inherent mathematical structure of Bayes-adaptive problems in order to dramatically simplify the problem by making the reward structure denser while simultaneously decoupling the learning of exploitation and exploration policies. The key to this simplification comes from the novel concept of cross-value (i.e. the value of being in an environment while acting optimally according to another), which we use to quantify the value of currently available…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms

MethodsDiffusion