Reinforcement Learning with Lookahead Information
Nadav Merlis

TL;DR
This paper introduces efficient reinforcement learning algorithms that leverage lookahead observations of rewards and transitions, significantly improving reward collection in unknown environments by planning with empirical data.
Contribution
It develops provably-efficient algorithms that incorporate lookahead information into RL, addressing a gap in handling such data in unknown environments.
Findings
Algorithms achieve tight regret bounds with lookahead data
Reward collection increases linearly with lookahead information
Outperforms traditional RL methods without lookahead
Abstract
We study reinforcement learning (RL) problems in which agents observe the reward or transition realizations at their current state before deciding which action to take. Such observations are available in many applications, including transactions, navigation and more. When the environment is known, previous work shows that this lookahead information can drastically increase the collected reward. However, outside of specific applications, existing approaches for interacting with unknown environments are not well-adapted to these observations. In this work, we close this gap and design provably-efficient learning algorithms able to incorporate lookahead information. To achieve this, we perform planning using the empirical distribution of the reward and transition observations, in contrast to vanilla approaches that only rely on estimated expectations. We prove that our algorithms achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAuction Theory and Applications · Game Theory and Applications · Advanced Bandit Algorithms Research
