Experiment Planning with Function Approximation
Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

TL;DR
This paper investigates non-adaptive experiment planning strategies for contextual bandit problems with complex reward functions, proposing methods that work with function approximation and analyzing their theoretical guarantees.
Contribution
It introduces two experiment planning strategies compatible with function approximation, extending beyond linear rewards, and analyzes their theoretical optimality guarantees.
Findings
Eluder planning guarantees depend on eluder dimension.
Uniform sampling achieves competitive rates for small action sets.
Fundamental differences between planning and adaptive learning are characterized.
Abstract
We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Machine Learning and Data Classification
MethodsSparse Evolutionary Training
