# A Dominant Strategy Truthful, Deterministic Multi-Armed Bandit Mechanism   with Logarithmic Regret

**Authors:** Divya Padmanabhan, Satyanath Bhat, Prabuchandran K.J., Shirish Shevade, and Y. Narahari

arXiv: 1703.00632 · 2020-06-01

## TL;DR

This paper introduces a deterministic multi-armed bandit mechanism with a novel concept of -regret that achieves logarithmic regret in sponsored search auctions by leveraging the typical reward separation between agents.

## Contribution

It proposes a new -regret framework and a deterministic, incentive-compatible MAB mechanism that attains logarithmic regret, improving over previous methods with higher regret bounds.

## Key findings

- Achieves -regret of O(log T) in sponsored search auctions.
- Extends results from single to multiple slot auctions.
- Provides a deterministic, incentive-compatible mechanism.

## Abstract

Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of $\Omega(T^{2/3})$, where $T$ is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents' stochastic rewards are separated by a very small amount that depends on $T$. We make, and, exploit the crucial observation that in most scenarios, the separation between the agents' rewards is rarely a function of $T$. Moreover, in the case that the rewards of the arms are arbitrarily close, the regret contributed by such sub-optimal arms is minimal. Our idea is to allow the center to indicate the resolution, $\Delta$, with which the agents must be distinguished. This immediately leads us to introduce the notion of $\Delta$-Regret. Using sponsored search auctions as a concrete example (the same idea applies for other applications as well), we propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. Remarkably, the proposed mechanism $\Delta$-UCB achieves a $\Delta$-regret of $O(\log T)$ for the case of sponsored search auctions. We first establish the results for single slot sponsored search auctions and then non-trivially extend the results to the case where multiple slots are to be allocated.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.00632/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1703.00632/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1703.00632/full.md

---
Source: https://tomesphere.com/paper/1703.00632