Reinforcement Learning with Lookahead Information

Nadav Merlis

arXiv:2406.02258·cs.LG·October 22, 2024·1 cites

Reinforcement Learning with Lookahead Information

Nadav Merlis

PDF

Open Access 1 Video

TL;DR

This paper introduces efficient reinforcement learning algorithms that leverage lookahead observations of rewards and transitions, significantly improving reward collection in unknown environments by planning with empirical data.

Contribution

It develops provably-efficient algorithms that incorporate lookahead information into RL, addressing a gap in handling such data in unknown environments.

Findings

01

Algorithms achieve tight regret bounds with lookahead data

02

Reward collection increases linearly with lookahead information

03

Outperforms traditional RL methods without lookahead

Abstract

We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reinforcement Learning with Lookahead Information· slideslive

Taxonomy

TopicsAuction Theory and Applications · Game Theory and Applications · Advanced Bandit Algorithms Research