Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration
Youngmin Oh, Jinje Park, Taejin Paik, Jaemin Park

TL;DR
This paper introduces variance-aware neural algorithms for the contextual dueling bandit problem, balancing exploration and exploitation with theoretical guarantees and empirical validation.
Contribution
It proposes a novel variance-aware exploration strategy using neural networks, providing the first theoretical regret bounds for this setting.
Findings
Achieves sublinear regret in synthetic and real-world tasks.
Balances exploration and exploitation effectively under UCB and TS frameworks.
Outperforms existing methods in empirical evaluations.
Abstract
In this paper, we address the contextual dueling bandit problem by proposing variance-aware algorithms that leverage neural networks to approximate nonlinear utility functions. Our approach employs a \textit{variance-aware exploration strategy}, which adaptively accounts for uncertainty in pairwise comparisons while relying only on the gradients with respect to the learnable parameters of the last layer. This design effectively balances the exploration--exploitation tradeoff under both the Upper Confidence Bound (UCB) and Thompson Sampling (TS) frameworks. As a result, under standard assumptions, we establish theoretical guarantees showing that our algorithms achieve sublinear cumulative average regret of order for sufficiently wide neural networks, where is the contextual dimension, the variance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
