Reward-Biased Maximum Likelihood Estimation for Neural Contextual   Bandits

Yu-Heng Hung; Ping-Chun Hsieh

arXiv:2203.04192·cs.LG·May 31, 2022

Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits

Yu-Heng Hung, Ping-Chun Hsieh

PDF

Open Access

TL;DR

This paper introduces NeuralRBMLE, a neural network-based approach for stochastic contextual bandits that incorporates reward-biased maximum likelihood estimation to enhance exploration and achieve competitive regret bounds.

Contribution

It adapts the classic RBMLE principle with neural networks for contextual bandits, proposing two algorithms with theoretical regret guarantees and superior empirical performance.

Findings

01

Both algorithms achieve rac{}{}( ilde{O}(\u221a{T})) regret.

02

NeuralRBMLE methods outperform state-of-the-art on real datasets.

03

The approach encodes exploration directly in neural network parameters.

Abstract

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The first variant directly obtains the RBMLE estimator by gradient ascent, and the second variant simplifies RBMLE to a simple index policy through an approximation. We show that both algorithms achieve $O (T)$ regret. Through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Age of Information Optimization