Jump Starting Bandits with LLM-Generated Prior Knowledge
Parand A. Alamdari, Yanshuai Cao, Kevin H. Wilson

TL;DR
This paper demonstrates that integrating Large Language Models with contextual bandits can significantly reduce online learning regret by using LLMs to generate prior knowledge, validated through experiments with simulated and real-world data.
Contribution
It introduces a novel initialization algorithm that leverages LLMs to produce pre-training data for contextual bandits, reducing data collection and learning time.
Findings
LLMs can effectively simulate human preferences for bandit initialization.
The proposed method reduces online learning regret in experiments.
Empirical validation includes both simulated and real-world data.
Abstract
We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human behaviours well enough to jump-start contextual multi-armed bandits to reduce online learning regret. We propose an initialization algorithm for contextual bandits by prompting LLMs to produce a pre-training dataset of approximate human preferences for the bandit. This significantly reduces online learning regret and data-gathering costs for training such models. Our approach is validated empirically through two sets of experiments with different bandit setups: one which utilizes LLMs to serve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Data Stream Mining Techniques
