Offline Contextual Bandits with Overparameterized Models

David Brandfonbrener; William F. Whitney; Rajesh Ranganath; Joan Bruna

arXiv:2006.15368·cs.LG·June 17, 2021

Offline Contextual Bandits with Overparameterized Models

David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether overparameterized models in offline contextual bandits generalize well, finding that value-based methods do due to action-stability, while policy-based methods do not, leading to different performance outcomes.

Contribution

The paper identifies action-stability as a key factor explaining the differing generalization behaviors of value-based and policy-based algorithms in overparameterized offline contextual bandits.

Findings

01

Value-based algorithms benefit from overparameterization and generalize well.

02

Policy-based algorithms do not exhibit the same generalization benefits.

03

Experimental results show significant performance differences aligned with theoretical analysis.

Abstract

Recent results in supervised learning suggest that while overparameterized models have the capacity to overfit, they in fact generalize quite well. We ask whether the same phenomenon occurs for offline contextual bandits. Our results are mixed. Value-based algorithms benefit from the same generalization behavior as overparameterized supervised learning, but policy-based algorithms do not. We show that this discrepancy is due to the \emph{action-stability} of their objectives. An objective is action-stable if there exists a prediction (action-value vector or action distribution) which is optimal no matter which action is observed. While value-based objectives are action-stable, policy-based objectives are unstable. We formally prove upper bounds on the regret of overparameterized value-based learning and lower bounds on the regret for policy-based algorithms. In our experiments with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidbrandfonbrener/deep-offline-bandits
pytorchOfficial

Videos

Offline Contextual Bandits with Overparameterized Models· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics

MethodsQ-Learning