Risk Minimization from Adaptively Collected Data: Guarantees for   Supervised and Policy Learning

Aur\'elien Bibaut; Antoine Chambaz; Maria Dimakopoulou and; Nathan Kallus; Mark van der Laan

arXiv:2106.01723·stat.ML·June 4, 2021·1 cites

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Aur\'elien Bibaut, Antoine Chambaz, Maria Dimakopoulou and, Nathan Kallus, Mark van der Laan

PDF

Open Access 1 Video

TL;DR

This paper develops new theoretical guarantees for empirical risk minimization when using adaptively collected data, such as from bandit algorithms, ensuring reliable learning in supervised and policy contexts.

Contribution

It introduces a novel importance sampling weighted ERM framework with generalization guarantees and fast convergence rates for adaptively collected data.

Findings

01

Provides the first generalization guarantees for ERM with adaptive data

02

Achieves fast convergence rates leveraging strong convexity in regression

03

Offers rate-optimal regret bounds for policy learning with decaying exploration

Abstract

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate in the data. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide rate-optimal regret guarantees that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques