Disentangling Exploration from Exploitation

Alessandro Lizzeri; Eran Shmaya; Leeat Yariv

arXiv:2404.19116·econ.TH·May 1, 2024

Disentangling Exploration from Exploitation

Alessandro Lizzeri, Eran Shmaya, Leeat Yariv

PDF

Open Access

TL;DR

This paper analyzes the optimal experimentation policy in Poisson bandits when exploration and exploitation are separated, showing that complete learning is achievable asymptotically and that the policy is complex and not indexable.

Contribution

It characterizes the optimal policy for disentangled exploration and exploitation in Poisson bandits, extending the understanding beyond traditional intertwined approaches.

Findings

01

Optimal policy achieves complete learning asymptotically

02

Policy exhibits persistence and complexity

03

Disentanglement is especially beneficial for intermediate parameters

Abstract

Starting from Robbins (1952), the literature on experimentation via multi-armed bandits has wed exploration and exploitation. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined: a policymaker may assess new policies different than the status quo; an investor may evaluate projects outside her portfolio. We characterize the optimal experimentation policy when exploration and exploitation are disentangled in the case of Poisson bandits, allowing for general news structures. The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index a la Gittins. Disentanglement is particularly valuable for intermediate parameter values.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques