Contextual Bandit Optimization with Pre-Trained Neural Networks

Mikhail Terekhov

arXiv:2501.06258·cs.LG·January 14, 2025

Contextual Bandit Optimization with Pre-Trained Neural Networks

Mikhail Terekhov

PDF

TL;DR

This paper introduces a novel algorithm, E2TC, for contextual bandit problems with neural network reward models, leveraging pre-training to achieve sublinear regret in smaller models and providing theoretical and empirical analysis.

Contribution

It proposes the E2TC algorithm that utilizes pre-trained neural network weights for efficient learning in contextual bandits, with theoretical regret bounds and practical evaluations.

Findings

01

E2TC achieves sublinear regret under certain conditions.

02

Pre-training improves learning efficiency in neural bandit models.

03

Experimental results validate the theoretical bounds and explore sample complexity.

Abstract

Bandit optimization is a difficult problem, especially if the reward model is high-dimensional. When rewards are modeled by neural networks, sublinear regret has only been shown under strong assumptions, usually when the network is extremely wide. In this thesis, we investigate how pre-training can help us in the regime of smaller models. We consider a stochastic contextual bandit with the rewards modeled by a multi-layer neural network. The last layer is a linear predictor, and the layers before it are a black box neural architecture, which we call a representation network. We model pre-training as an initial guess of the weights of the representation network provided to the learner. To leverage the pre-trained weights, we introduce a novel algorithm we call Explore Twice then Commit (E2TC). During its two stages of exploration, the algorithm first estimates the last layer's weights…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.