Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

Mat\'ias Carrasco; Alejandro Cholaquidis

arXiv:2604.22140·stat.ML·April 30, 2026

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

Mat\'ias Carrasco, Alejandro Cholaquidis

PDF

TL;DR

This paper introduces a new approach for stochastic multi-armed bandits focusing on statistical utilities of reward distributions, using influence functions for gradient estimation and mirror ascent for optimization.

Contribution

It develops a novel influence-function based gradient estimation method for distributional utilities and applies mirror ascent algorithms to optimize these utilities in bandit settings.

Findings

01

Established regret bounds separating optimization error from influence function bias.

02

Demonstrated the framework on variance and Wasserstein utilities with numerical experiments.

03

Compared exact and plug-in influence-function implementations showing practical effectiveness.

Abstract

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mixed policies: each weight vector \(w\) on the simplex induces a mixture law \(P^w\), and performance is measured by the concave utility \(U(w)=\mathfrak U(P^w)\). For differentiable statistical utilities, we use influence-function calculus to derive stochastic gradient estimators from bandit feedback. This leads to an entropic mirror-ascent algorithm on a truncated simplex, implemented through multiplicative-weights updates and plug-in estimates of the influence function. We establish regret bounds that separate the mirror-ascent optimization error from the bias caused by estimating the influence function. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.