Model Selection in Batch Policy Optimization
Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

TL;DR
This paper investigates the challenges of model selection in batch policy optimization within the contextual bandit setting, highlighting the inherent difficulties and proposing algorithms that balance approximation, complexity, and coverage errors.
Contribution
It formalizes the problem of model selection in batch policy optimization, identifies unique error sources, and develops algorithms that effectively trade off these errors despite fundamental limitations.
Findings
No algorithm can address all three error sources simultaneously.
Relaxing one error source enables near-oracle guarantees for the others.
Experimental results demonstrate the effectiveness of the proposed algorithms.
Abstract
We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and model classes, learn a policy with performance that is competitive with the policy derived from the best model class. We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage. The first two sources are common in model selection for supervised learning, where optimally trading-off these properties is well-studied. In contrast, the third source is unique to batch policy optimization and is due to dataset shift inherent to the setting. We first show that no batch policy optimization algorithm can achieve a guarantee addressing all three simultaneously,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Machine Learning and Data Classification
