# Learning Best Response Strategies for Agents in Ad Exchanges

**Authors:** Stavros Gerakaris, Subramanian Ramamoorthy

arXiv: 1902.03588 · 2019-02-12

## TL;DR

This paper introduces a novel approach for autonomous agents in ad exchanges to learn optimal strategies under censored information using a Bayesian game framework and Kaplan-Meier estimator, outperforming standard methods.

## Contribution

It adapts the HBA algorithm for censored data in ad exchanges and incorporates a Kaplan-Meier estimator for stochastic opponent modeling, advancing strategic learning in complex markets.

## Key findings

- HBA-KM outperforms Q-learning and UCB agents in simulations.
- HBA-KM achieves near-optimal competitive ratios.
- Lower variance in returns compared to baselines.

## Abstract

Ad exchanges are widely used in platforms for online display advertising. Autonomous agents operating in these exchanges must learn policies for interacting profitably with a diverse, continually changing, but unknown market. We consider this problem from the perspective of a publisher, strategically interacting with an advertiser through a posted price mechanism. The learning problem for this agent is made difficult by the fact that information is censored, i.e., the publisher knows if an impression is sold but no other quantitative information. We address this problem using the Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm, which conceptualises this interaction in terms of a Stochastic Bayesian Game and arrives at optimal actions by best responding with respect to probabilistic beliefs maintained over a candidate set of opponent behaviour profiles. We adapt and apply HBA to the censored information setting of ad exchanges. Also, addressing the case of stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator for opponent modelling. We evaluate the proposed method using simulations wherein we show that HBA-KM achieves substantially better competitive ratio and lower variance of return than baselines, including a Q-learning agent and a UCB-based online learning agent, and comparable to the offline optimal algorithm.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.03588/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1902.03588/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1902.03588/full.md

---
Source: https://tomesphere.com/paper/1902.03588