Leveraging Demonstrations to Improve Online Learning: Quality Matters

Botao Hao; Rahul Jain; Tor Lattimore; Benjamin Van Roy; Zheng Wen

arXiv:2302.03319·cs.LG·May 18, 2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

PDF

Open Access

TL;DR

This paper explores how offline demonstration data, especially of varying quality, can enhance online learning performance using Thompson sampling in multi-armed bandits, with theoretical and empirical insights.

Contribution

It introduces an informed Thompson sampling algorithm that incorporates demonstration data via Bayesian methods and provides regret bounds dependent on demonstration quality.

Findings

01

Higher demonstration quality leads to greater online performance improvements.

02

The proposed Bayesian bootstrapping method effectively reduces empirical regret.

03

Pretraining with expert demonstrations significantly benefits online learning outcomes.

Abstract

We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms

MethodsSpatio-temporal stability analysis