Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

Youngmin Oh; Jinje Park; Taejin Paik; Jaemin Park

arXiv:2506.01250·cs.LG·May 12, 2026

Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration

Youngmin Oh, Jinje Park, Taejin Paik, Jaemin Park

PDF

TL;DR

This paper introduces variance-aware neural algorithms for the contextual dueling bandit problem, balancing exploration and exploitation with theoretical guarantees and empirical validation.

Contribution

It proposes a novel variance-aware exploration strategy using neural networks, providing the first theoretical regret bounds for this setting.

Findings

01

Achieves sublinear regret in synthetic and real-world tasks.

02

Balances exploration and exploitation effectively under UCB and TS frameworks.

03

Outperforms existing methods in empirical evaluations.

Abstract

In this paper, we address the contextual dueling bandit problem by proposing variance-aware algorithms that leverage neural networks to approximate nonlinear utility functions. Our approach employs a \textit{variance-aware exploration strategy}, which adaptively accounts for uncertainty in pairwise comparisons while relying only on the gradients with respect to the learnable parameters of the last layer. This design effectively balances the exploration--exploitation tradeoff under both the Upper Confidence Bound (UCB) and Thompson Sampling (TS) frameworks. As a result, under standard assumptions, we establish theoretical guarantees showing that our algorithms achieve sublinear cumulative average regret of order $\bigol < (d \sum_{t = 1}^{T} σ_{t}^{2} + d T \rt),$ for sufficiently wide neural networks, where $d$ is the contextual dimension, $σ_{t}^{2}$ the variance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.