Dynamic First Price Auctions Robust to Heterogeneous Buyers

Shipra Agrawal; Eric Balkanski; Vahab Mirrokni; Balasubramanian Sivan

arXiv:1906.03286·cs.GT·June 11, 2019

Dynamic First Price Auctions Robust to Heterogeneous Buyers

Shipra Agrawal, Eric Balkanski, Vahab Mirrokni, Balasubramanian Sivan

PDF

Open Access

TL;DR

This paper develops a dynamic auction mechanism that maintains near-optimal revenue despite the presence of diverse buyer behaviors, including myopic, forward-looking, and learning buyers, in a repeated auction setting.

Contribution

It introduces a simple state-based mechanism that is robust to heterogeneous buyer types and achieves a constant fraction of the optimal revenue.

Findings

01

Mechanism achieves a constant fraction of optimal revenue.

02

Robust to various buyer behaviors including learning and foresight.

03

Applicable to diverse, unknown buyer populations.

Abstract

We study dynamic mechanisms for optimizing revenue in repeated auctions, that are robust to heterogeneous forward-looking and learning behavior of the buyers. Typically it is assumed that the buyers are either all myopic or are all infinite lookahead, and that buyers understand and trust the mechanism. These assumptions raise the following question: is it possible to design approximately revenue optimal mechanisms when the buyer pool is heterogeneous? Facing a heterogeneous population of buyers with an unknown mixture of $k$ -lookahead buyers, myopic buyers, no-regret-learners and no-policy-regret learners, we design a simple state-based mechanism that achieves a constant fraction of the optimal achievable revenue.

Equations87

U_{i}^{[t, t]} (H_{t - 1}, v_{i, t}, s) = E_{v_{j, t} \sim F, j \neq = i} [v_{i, t} \cdot x_{i} (H_{t - 1}, b_{t}) - p_{i} (H_{t - 1}, b_{t}); b_{j, t} = s_{j} (H_{t - 1}, v_{j, t}), \forall j]

U_{i}^{[t, t]} (H_{t - 1}, v_{i, t}, s) = E_{v_{j, t} \sim F, j \neq = i} [v_{i, t} \cdot x_{i} (H_{t - 1}, b_{t}) - p_{i} (H_{t - 1}, b_{t}); b_{j, t} = s_{j} (H_{t - 1}, v_{j, t}), \forall j]

\displaystyle U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},{\bf s}\bigg{)}=\operatorname{\mathbb{E}}_{v_{j,t},j\neq i,{\bf v}^{[t+1,t+k]}}\sum_{r=0}^{k}\operatorname{\mathbb{E}}_{W}\left[U^{[t+r,t+r]}_{i}\bigg{(}(H_{t-1},W_{[t,t+r-1]}),v_{i,t+r},{\bf s}\bigg{)}\right]

\displaystyle U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},{\bf s}\bigg{)}=\operatorname{\mathbb{E}}_{v_{j,t},j\neq i,{\bf v}^{[t+1,t+k]}}\sum_{r=0}^{k}\operatorname{\mathbb{E}}_{W}\left[U^{[t+r,t+r]}_{i}\bigg{(}(H_{t-1},W_{[t,t+r-1]}),v_{i,t+r},{\bf s}\bigg{)}\right]

\forall v_{i,t},{\bf s}_{-i}:U^{[t,t+k]}_{i}\bigg{(}H_{t-1},{v}_{i,t},s_{i},{\bf s}_{-i}\bigg{)}\geq U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},s^{\prime}_{i},{\bf s}_{-i}\bigg{)}

\forall v_{i,t},{\bf s}_{-i}:U^{[t,t+k]}_{i}\bigg{(}H_{t-1},{v}_{i,t},s_{i},{\bf s}_{-i}\bigg{)}\geq U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},s^{\prime}_{i},{\bf s}_{-i}\bigg{)}

\exists v_{i,t},{\bf s}_{-i}:U^{[t,t+k]}_{i}\bigg{(}H_{t-1},{v}_{i,t},s_{i},{\bf s}_{-i}\bigg{)}>U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},s^{\prime}_{i},{\bf s}_{-i}\bigg{)}

\exists v_{i,t},{\bf s}_{-i}:U^{[t,t+k]}_{i}\bigg{(}H_{t-1},{v}_{i,t},s_{i},{\bf s}_{-i}\bigg{)}>U^{[t,t+k]}_{i}\bigg{(}H_{t-1},v_{i,t},s^{\prime}_{i},{\bf s}_{-i}\bigg{)}

g_{t} (b) := v_{i, t} x_{i} (H_{t - 1}, b, b_{- i, t}) - p_{i} (H_{t - 1}, b, b_{- i, t})

g_{t} (b) := v_{i, t} x_{i} (H_{t - 1}, b, b_{- i, t}) - p_{i} (H_{t - 1}, b, b_{- i, t})

Regret (T) = f \in E max t = 1 \sum T g_{t} (f (H_{t - 1}, v_{i, t})) - t = 1 \sum T g_{t} (b_{i, t})

Regret (T) = f \in E max t = 1 \sum T g_{t} (f (H_{t - 1}, v_{i, t})) - t = 1 \sum T g_{t} (b_{i, t})

g_{t} (b, H_{t - 1}) := E_{v_{i} \sim F} [v_{i} x_{i} (H_{t - 1}, b, b_{- i, t}) - p_{i} (H_{t - 1}, b, b_{- i, t}) where b_{j, t} := s_{j, t} (H_{t - 1}, v_{j, t})] .

g_{t} (b, H_{t - 1}) := E_{v_{i} \sim F} [v_{i} x_{i} (H_{t - 1}, b, b_{- i, t}) - p_{i} (H_{t - 1}, b, b_{- i, t}) where b_{j, t} := s_{j, t} (H_{t - 1}, v_{j, t})] .

Policy-Regret (T) = f \in E max t = 1 \sum T g_{t} (f (H_{t - 1}^{'}, v_{i, t}), H_{t - 1}^{'}) - t = 1 \sum T g_{t} (b_{i, t}, H_{t - 1})

Policy-Regret (T) = f \in E max t = 1 \sum T g_{t} (f (H_{t - 1}^{'}, v_{i, t}), H_{t - 1}^{'}) - t = 1 \sum T g_{t} (b_{i, t}, H_{t - 1})

Θ (1) (q_{n_{soph}}^{†} + Rev^{Mye} (n_{naive})) - o (1),

Θ (1) (q_{n_{soph}}^{†} + Rev^{Mye} (n_{naive})) - o (1),

(1 - ϵ) \frac{1}{4} \cdot q_{n_{soph}}^{†} + \frac{ρ ( 1 - ϵ )}{2} (1 - \frac{1}{e}) Rev^{Mye} (n_{naive}) - o (1)

(1 - ϵ) \frac{1}{4} \cdot q_{n_{soph}}^{†} + \frac{ρ ( 1 - ϵ )}{2} (1 - \frac{1}{e}) Rev^{Mye} (n_{naive}) - o (1)

E_{v_{1} \sim F, \dots, v_{n_{soph}} \sim F} [max (v_{1}, \dots, v_{n_{soph}})] = i = 1 \sum n_{soph} \frac{1}{n _{soph}} E_{v_{1} \sim F, \dots, v_{n_{soph}} \sim F} [v_{i} ∣ v_{i} = max (v_{1}, \dots, v_{n_{soph}})] .

E_{v_{1} \sim F, \dots, v_{n_{soph}} \sim F} [max (v_{1}, \dots, v_{n_{soph}})] = i = 1 \sum n_{soph} \frac{1}{n _{soph}} E_{v_{1} \sim F, \dots, v_{n_{soph}} \sim F} [v_{i} ∣ v_{i} = max (v_{1}, \dots, v_{n_{soph}})] .

ϵ (1 - ϵ) q_{m_{g}}^{†} H .

ϵ (1 - ϵ) q_{m_{g}}^{†} H .

(1 - Pr (e_{1})) H ϵ q_{m_{g}}^{†} + Pr (e_{1}) H (- r^{g})

(1 - Pr (e_{1})) H ϵ q_{m_{g}}^{†} + Pr (e_{1}) H (- r^{g})

(substituting δ = \frac{4 l o g ( 1/ ϵ )}{H})

(1 + ϵ) \frac{ρ E}{m _{b}} q_{m_{b}}^{†} .

(1 + ϵ) \frac{ρ E}{m _{b}} q_{m_{b}}^{†} .

ρ E \cdot E [(v - r^{b}) I (v \geq r^{b})]

ρ E \cdot E [(v - r^{b}) I (v \geq r^{b})]

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \leq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†} .

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \leq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†} .

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \leq j = ℓ (t) + 1 \sum ℓ (t + k) (1 + ϵ) \frac{ρ E ^{j}}{m _{b}^{j}} q_{m_{b}^{j}}^{†} \leq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†} .

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \leq j = ℓ (t) + 1 \sum ℓ (t + k) (1 + ϵ) \frac{ρ E ^{j}}{m _{b}^{j}} q_{m_{b}^{j}}^{†} \leq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†} .

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \geq (1 - ϵ) H - q_{m_{g}^{ℓ (t)}}^{†} + ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†} .

U_{i}^{[t, t + k]} (H_{t - 1}, v_{i, t}, s) \geq (1 - ϵ) H - q_{m_{g}^{ℓ (t)}}^{†} + ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†} .

minimum utility from the rest of this epoch - q_{m_{g}^{ℓ (t)}}^{†} (1 - ϵ) \cdot H + minimum utility over next ℓ (t + k) - ℓ (t) - 2 epochs ϵ (1 - ϵ) H j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

minimum utility from the rest of this epoch - q_{m_{g}^{ℓ (t)}}^{†} (1 - ϵ) \cdot H + minimum utility over next ℓ (t + k) - ℓ (t) - 2 epochs ϵ (1 - ϵ) H j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

\frac{1}{m _{g}} q_{m_{g}}^{†} \geq \cdot \frac{1}{n} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{m _{b}} q_{m_{b}}^{†} .

\frac{1}{m _{g}} q_{m_{g}}^{†} \geq \cdot \frac{1}{n} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{m _{b}} q_{m_{b}}^{†} .

\frac{1}{m _{1}} q_{m_{1}}^{†} \geq \frac{1}{m _{2}} q_{m_{2}}^{†} .

\frac{1}{m _{1}} q_{m_{1}}^{†} \geq \frac{1}{m _{2}} q_{m_{2}}^{†} .

\frac{1}{m _{g}} q_{m_{g}}^{†} \geq \frac{1}{max ( n /2 , m _{g} )} q_{m a x (n /2, m_{g})}^{†} \geq \frac{1}{n} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{n /2} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{m _{b}} q_{m_{b}}^{†}

\frac{1}{m _{g}} q_{m_{g}}^{†} \geq \frac{1}{max ( n /2 , m _{g} )} q_{m a x (n /2, m_{g})}^{†} \geq \frac{1}{n} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{n /2} q_{n /2}^{†} \geq \frac{1}{2} \cdot \frac{1}{m _{b}} q_{m_{b}}^{†}

(1 - ϵ) H - q_{m_{g}^{ℓ (t)}}^{†} + ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†} \geq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†}

(1 - ϵ) H - q_{m_{g}^{ℓ (t)}}^{†} + ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†} \geq \frac{2 ( 1 + ϵ ) ρ H}{( 1 - ρ ) ( 1 - δ )} j = ℓ (t) + 1 \sum ℓ (t + k) \frac{m _{g}^{j}}{m _{b}^{j}} \cdot q_{m_{b}^{j}}^{†}

\frac{1}{3} (1 - ϵ) ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

\frac{1}{3} (1 - ϵ) ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

\frac{1}{3} (1 - ϵ) ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

\frac{1}{3} (1 - ϵ) ϵ j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

j = ℓ (t) + 1 \sum ℓ (t + k) - 1 q_{m_{g}^{j}}^{†}

\geq \frac{1}{n} q_{n /2}^{†} \frac{( 1 - ρ ) ( 1 - δ )}{2 H} j = ℓ (t) + 1 \sum ℓ (t + k) - 1 E^{j}

\geq \frac{1}{n} q_{n /2}^{†} \frac{( 1 - ρ ) ( 1 - δ )}{2 H} \frac{8}{ϵ ( 1 - ϵ )} E_{max}

\geq \frac{8 q _{n /2}^{†}}{ϵ ( 1 - ϵ )}

\geq \frac{4 q _{n}^{†}}{ϵ ( 1 - ϵ )}

q_{n}^{†} \geq q_{m_{g}^{ℓ (t + k)}}^{†} \geq \frac{1}{2} \frac{m _{g}^{ℓ (t + k)}}{m _{b}^{ℓ (t + k)}} \cdot q_{m_{b}^{ℓ (t + k)}}^{†}

q_{n}^{†} \geq q_{m_{g}^{ℓ (t + k)}}^{†} \geq \frac{1}{2} \frac{m _{g}^{ℓ (t + k)}}{m _{b}^{ℓ (t + k)}} \cdot q_{m_{b}^{ℓ (t + k)}}^{†}

\frac{( 1 - ρ ) ( 1 - δ ) ϵ ( 1 - ϵ ) q _{m_{g}^{i}}^{†}}{2 m _{g}^{i}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Consumer Market Behavior and Pricing

Full text

\declaretheorem

[name=Theorem, sibling=theorem]rThm \declaretheorem[name=Lemma, sibling=lemma]rLem \declaretheorem[name=Corollary, sibling=proposition]rCor \declaretheorem[name=Proposition, sibling=theorem]rPro

Dynamic First Price Auctions Robust to Heterogeneous Buyers

Shipra Agrawal111Columbia University. [email protected]. This research was supported in part by a Google Faculty Research Award and NSF CAREER award CMMI-1846792.

Eric Balkanski222Harvard University. [email protected]. This research was supported in part by a Google PhD Fellowship.

Vahab Mirrokni333Google Research, New York. [email protected].

Balasubramanian Sivan 444Google Research, New York. [email protected].

Abstract

We study dynamic mechanisms for optimizing revenue in repeated auctions, that are robust to heterogeneous forward-looking and learning behavior of the buyers. Typically it is assumed that the buyers are either all myopic or are all infinite lookahead, and that buyers understand and trust the mechanism. These assumptions raise the following question: is it possible to design approximately revenue optimal mechanisms when the buyer pool is heterogeneous? Facing a heterogeneous population of buyers with an unknown mixture of $k$ -lookahead buyers, myopic buyers, no-regret-learners and no-policy-regret learners, we design a simple state-based mechanism that achieves a constant fraction of the optimal achievable revenue.

1 Introduction

Bundling increases revenue in commerce: this is a well known fact widely used in practice (e.g., Amazon often suggests a “frequently bought together” bundle while browsing many items). This same principle is at the heart of why stateful repeated/dynamic auctions are significantly more lucrative than one-shot auctions: stateful dynamic auctions profitably use the opportunity to bundle across time. This revenue opportunity has inspired a series of recent works [PPPR16, ADH16, MLTZ16, BML17, LP17, MLTZ18, ADMS18, BMSW18, BMLZ19] on designing dynamic auctions satisfying various desired properties.

The ability to link auctions across time significantly expands the design space of auctions allowing some exceedingly complex auctions. How does a buyer optimize when bidding in such auctions? The standard and widely used notion in the literature of incentive compatibility (IC) requires that the buyers understand these complicated auctions well, and have an “infinite lookahead”. I.e., they require the buyer to think about the consequences of their bid in the current round on all future round utilities and optimize current round bid accordingly. In particular, as [ADMS18] note, this assumes that the buyers understand the mechanism deeply enough to optimally respond, they believe that their interactions with the seller will last for all future rounds accounted for when computing utility, believe that the seller is indeed strictly following the advertised mechanism etc.

Numerous practical reasons make these assumptions far from true in reality: buyer’s computational limitations, inability to predict the future well enough, inability to trust a seller or understand the exact auction that is run in a complex supply chain of auctions. A striking example of this is the display ads market in Internet advertising. Given the number of ad exchanges, and the variety of purchase mechanisms that are constantly evolving over time, buyers are often unable to trust or verify whether a seller has stuck to an announced mechanism.

The consequence is that the seller faces a heterogeneous buyer population employing a large spectrum of strategies to maximize their perceived utility. As [ADMS18] note, such a buyer could behave myopically, or have a limited lookahead (i.e., a $k$ -lookahead instead of infinite lookahead), or be a learner that makes decisions only based on past performance of various bidding strategies thereby completely disregarding the seller’s description/promises in the mechanism’s future.

The solution to tackling a heterogeneous buyer behavior cannot be a buyer-specific auction that tailors the optimal auction for a given buyer behavior. Implementing such a discriminative auction may be legally infeasible and also impractical (buyer’s behavior may even change over time). Motivated by these observations, [ADMS18] considered the setting of a single seller repeatedly interacting with a single buyer whose behavior (myopic/infinite lookahead/learner etc.) the seller is a priori unaware of, and designed a single mechanism that simultaneously obtains a constant fraction of the optimal revenue achievable against each potential buyer type. In this paper, we study the following question:

Is there an $n$ -buyer mechanism that is robust against heterogeneous buyer behaviors?

We consider a general setting with an arbitrary and unknown mixture of multiple heterogeneous buyers. The challenges introduced by considering multiple buyers are discussed after describing the setting and main result.

The setting.

We study a repeated interaction between a single seller and $n$ buyers over $T$ rounds. At the beginning of each round $t=1,2,\ldots,T$ , there is a single fresh good for sale whose private value $v_{i,t}\in V$ for buyer $i$ is drawn, independently from other $j\neq i$ and $t^{\prime}\neq t$ , from a publicly known distribution $F$ with finite expectation $\mu$ . The buyer observes the valuation $v_{it}$ before making a bid $b_{it}$ . The good for sale in round $t$ has to be either allocated to one of the buyers or discarded immediately. Each buyer’s valuation is additive across rounds. We consider a range of buyer behaviors similar to [ADMS18], including myopic buyers, $k$ -lookahead buyers, no-regret learners and no-policy-regret learners. Formal definitions of these different buyer behaviors are provided in Section 2.

We categorize as sophisticated buyers the buyers who are either $k$ -lookahead for $k\geq k_{\text{soph}}=\Theta(n)$ or no-policy-regret learners, and naive buyers those who are myopic ([math]-lookahead) or no-regret learners. Given a population with $n_{\text{soph}}$ sophisticated and $n_{\text{naive}}=n-n_{\text{soph}}$ naive buyers, a robust mechanism aims to achieve close to maximum per-round revenue achievable from such a population, without prior knowledge of which buyer is naive or sophisticated, or the values of $n_{\text{soph}}$ and $n_{\text{naive}}$ .

Upper bound on revenue.

With only $n_{\text{naive}}$ naive buyers, e.g. $n_{\text{naive}}$ myopic buyers, it is impossible in each round to get more than the optimal $\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ revenue in a single-round auction obtainable from Myerson’s auction [Mye81a]. With only $n_{\text{soph}}$ sophisticated buyers, it is impossible to get more than $\operatorname{\mathbb{E}}_{v_{1},\dots,v_{n_{\text{soph}}}\sim F}\left[\max_{i}v_{i}\right]$ revenue, as revenue is upper bounded by the maximum value. It is easy to show that $\operatorname{\mathbb{E}}_{v_{1},\dots,v_{n_{\text{soph}}}\sim F}\left[\max_{i}v_{i}\right]\leq q^{\dagger}_{n_{\text{soph}}}:=\operatorname{\mathbb{E}}_{v\sim F}\left[v|v\geq F^{-1}(1-1/n_{\text{soph}})\right]$ (see Appendix A). In a setting with $n_{\text{soph}}$ and $n_{\text{naive}}$ buyers, the total revenue achievable is thus at most $q^{\dagger}_{n_{\text{soph}}}+\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ (see Appendix A). Motivated by this upper bound, we define the following benchmark.

Definition 1.

We call a mechanism $(\alpha,\beta)$ -robust* if, for any per-round valuation distribution $F$ , and for every value of $n_{\text{soph}}\in[n]$ (and $n_{\text{naive}}=n-n_{\text{soph}}$ ), without knowing $n_{\text{soph}}$ or $n_{\text{naive}}$ , it achieves an expected per round revenue of at least $\alpha\cdot\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})+\beta\cdot q^{\dagger}_{n_{\text{soph}}}-o(1).$ *

Main result.

We construct a mechanism that is Interim Individually Rational (IIR)555[ADMS18] show that it is impossible to achieve such a high-revenue with per-round ex-post IR unless the buyer’s lookahead is very high even in a single buyer setting, which rules that out as well in our setting. and $(\Theta(1),\Theta(1))$ -robust. I.e., for every value of $n_{\text{soph}},n_{\text{naive}}$ , without knowing $n_{\text{soph}}$ or $n_{\text{naive}}$ , it achieves a per-round revenue within a constant factor of the near optimal $\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})+q^{\dagger}_{n_{\text{soph}}}-o(1)$ revenue. The mechanism is a simple-to-implement first price auction with reserve price based on the current state of buyers (where state is a succinct summary of the buyer’s history). With the display ads industry moving to use first-price auctions [ada, dig, ade], this result is especially significant.

1.1 Overview of challenges and technical approach

The interactions between $n$ buyers, with heterogeneous and unknown lookahead and learning behaviors, introduce multiple challenges compared to the single buyer setting studied in [ADMS18].

Firstly an equilibrium, which is a profile of mutually best responding strategies from all agents, and a widely used predictor of a mechanism’s outcome, is unlikely to exist in our setting. Therefore it is not possible to prove revenue guarantees by arguing about what revenue would be obtained in the equilibrium outcome. Indeed, far from being able to pinpoint what our mechanism’s outcome will be, we only guarantee what the mechanism’s outcome will not be, as long as all agents satisfy much weaker notions of rationality like playing undominated strategies [BLP06] or no-regret strategies. This aligns well with our assumptions of heterogeneity and distrust among buyers. With just this guarantee of which outcomes will not occur in our mechanism, we are able to establish our strong revenue guarantees. In contrast, in the 1-buyer setting of [ADMS18], there are no equilibrium concerns: the single buyer simply best responds according to his utility function.

Another non-trivial challenge is that while proving results in undominated strategies, we have to establish a given strategy for buyer $i$ being dominated regardless of another buyer $j$ ’s strategies. In particular $j$ ’s strategy could be an arbitrary function of the entire history of not only $j$ ’s own bids and outcomes, but also those of $i$ . This creates complex dependencies of buyer $i$ ’s future utility on his bid today. For example, even if $i$ bids truthfully according to his valuations today, he has to consider a strategy of $j$ that would bid very high and sabotage $i$ in the future if it sees $i$ bidding beyond a certain bid in the current round. It requires careful design choices in the mechanism to be able to guarantee certain buyer behavior under such complex side-effects of a buyer’s bid. Key aspects of our mechanism are designed to allow lower bounding the utility of a buyer irrespective of other buyers’ behavior. This ability is crucial for establishing our revenue guarantees.

We construct a state-based mechanism where sophisticated buyers remain in a good state yielding high revenue, and where naive buyers bid as in an approximately revenue optimal one round auction. The mechanism incentivizes sophisticated buyers to remain in this good state in spite of the arbitrariness of other buyers’ responses, while also achieving high revenue. To guarantee high utility to every sophisticated buyer in the good state, the mechanism temporarily “rests” buyers who have recently been allocated a large number of items. Ignoring such buyers for a small number of rounds guarantees some rounds for each buyer to enjoy positive utility and get the item as long as they bid high enough. To do well against the benchmark of $q^{\dagger}_{n_{\text{soph}}}+\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ that extracts the optimal revenue possible from every buyer category, the mechanism necessarily has to adapt to the mixture of buyer population it is observing. Incorporating this adaptivity, and handling the resulting complications in utility analysis are some further challenges that we handle in our mechanism design and analysis.

1.2 Related work

There are several streams of literature related to our work in dynamic mechanism design.

Revenue maximization in dynamic auctions.

The stream closest to our paper is the work on revenue maximization in repeated auctions [PPPR16, ADH16, MLTZ16, BML17, LP17, MLTZ18, ADMS18, BMSW18, BMLZ19]. The main difference of our work from all of these, with the exception of [ADMS18, BMSW18], is the notion of dynamic incentive compatibility which assumes that all buyers are infinite lookahead, while we allow arbitrary mixtures of buyer attitudes. Agrawal et al. [ADMS18] study the $1$ -buyer setting and design robust auctions when the buyers are $k$ -lookahead buyers or no-simple-regret learners or no-policy-regret learners. Braverman et al. [BMSW18] also study the $1$ -buyer setting and design mechanisms to extract more than Myerson’s revenue when the buyers follow no-simple-regret learning strategies, and in particular a subset of them called mean-based bidding strategies.

Repeated interactions with evolving values.

There is a large body of work on designing mechanisms in repeated interactions when the buyers’ values evolve over time. See [BB84, Bes85, CH00, Bat05, ES07, AS13, KLN13, PST14, BS15, CDKS16] for an overview of dynamic mechanisms in such settings.

Bargaining, durable goods monopolist and Coase conjecture.

Unlike our setting where the value is drawn independently in every round, there is a large body of literature in economics that studies settings where the value is initially drawn from a distribution but remains fixed later on. This setting can be motivated based on several applications including bargaining, durable goods monopoly and behavior based discrimination. See [FVB06] for an excellent survey and references therein for an overview of this area and [DPS15, ILPT17] for work in the theoretical computer science literature.

Lookahead search.

The study of $k$ -lookahead search can be viewed in the context of bounded rationality, as pioneered by Herb Simon [Sim55]. He argued that, instead of optimizing, agents may apply a class of heuristics, termed satisfying heuristics in decision making. A natural choice of such heuristics is restricting the search space of best-response moves. Lookahead search in decision-making has been motivated and examined in great extent by the artificial intelligence community [Nau83, dKaS92, SKN09]. Lookahead search is also related to the sequential thinking framework in game theory [SW94]. More recently, [MTV12] study the quality of equilibrium outcomes for look-ahead search strategies for various classes of games. They observe that the quality of resulting equilibria increases in generalized second-price auctions, and duopoly games, but not in other classes of games. No prior work studies dynamic mechanisms that are robust against various lookahead search strategies.

2 Preliminaries

There are $n$ buyers and a single item for auction at each of $T$ sequential time steps. The value of buyer $i\in[n]$ at time step $t\in[T]$ for the item is denoted by $v_{i,t}$ and is drawn i.i.d. from a common prior distribution $F$ . The distribution $F$ is known to all the buyers and the seller. The realization of the value $v_{i,t}\sim F$ is private and is visible only to buyer $i$ . In each round $t$ , every buyer makes a bid $b_{i,t}$ . A buyer may use history of bids, allocation, and payments before round $t$ along with its own private valuations until round $t$ , to decide the bid in round $t$ . The seller uses the entire bid profile ${\bf b}_{t}=(b_{1,t},\ldots,b_{n,t})$ at time $t$ , along with history (bids, allocations, and payments) before time $t$ to decide allocation $x_{i,t}\in\{0,1\}$ and payment $p_{i,t}\geq 0$ for each buyer, such that $\sum_{i=1}^{n}x_{i,t}\leq 1$ .

More precisely, the bidding, payment, and allocation strategy space are defined as follows.

Definition 2.

Let $H_{t-1}:=({\bf b}_{1},{\bf x}_{1},{\bf p}_{1},\ldots,{\bf b}_{t-1},{\bf x}_{t-1},{\bf p}_{t-1})$ denote the history of bids, allocation, and payment before time $t$ .

Definition 3 (Bidding strategy).

Buyer $i$ ’s bids in round $t$ are decided by functions $b_{i,t}=s_{i}(H_{t-1},v_{i,t})$ , i.e. bids in round $t$ are (possibly randomized) functions of history $H_{t-1}$ and buyer’s private valuation in round $t$ .

Definition 4 (Payment and allocation function).

For each buyer $i$ , payment and allocation in round $t$ are given by $x_{i,t}=x_{i}(H_{t-1},{\bf b}_{t})$ and $p_{i,t}=p_{i}(H_{t-1},{\bf b}_{t})$ of history $H_{t-1}$ and bid profile in round $t$ . Here, $x_{i}(H_{t-1},{\bf b}_{t})\in[0,1]$ denotes the probability of allocation to bidder $i$ , and $p_{i}(H_{t-1},{\bf b}_{t})\geq 0$ denotes the bidder’s expected payment.

Definition 5 (Buyer’s realized utility).

Buyer $i$ ’s realized utility in a round $t$ is given by $u_{i,t}=v_{i,t}x_{i,t}-p_{i,t}$ , i.e., it is the difference between valuation and payment if the good is allocated to the buyer, and [math] otherwise. We often refer to this as simply the buyer’s utility in a round.

Remark 1.

Our current definition of history includes everything from past that can possibly be revealed to buyers and can be relevant to a buyer’s bidding strategy. However, our results are not tied to this particular definition of history. For example, instead of revealing the entire vector of bids from the past, if we happen to just reveal the allocation and payment after each auction, that would simply restrict the buyer’s strategy space further. Our arguments will hold as long as a buyer can see their own bids, allocation, and payments in the past, and whether or not the good was allocated in each round.

Remark 2.

Technically, the history for buyer $i$ should also include their own private valuations in the past. However, since the valuations are generated independently in each round from a known valuation distribution, are visible only to the buyer, and the buyer’s total utility is additive across rounds, the past valuations are irrelevant for the buyer’s bidding strategy. Therefore, for simplicity, we chose to eliminate them from the definition of buyer’s history.

Also define buyer i’s history, $H_{i,t-1}:=(H_{t-1},v_{i,1},\ldots,v_{i,t-1})$ . would things should be still fine: this is because whatever be the strategies other buyers use, in a domination argument, we are going to fix those strategies, and then show that a chosen strategy for a protagonist buyer is dominated by another strategy.

In this paper, we focus on the first price auction mechanism which is defined by the following specific allocation and payment function, with flexibility to choose the mechanism for setting a “reserve price”.

Definition 6 (First price auction with reserve price).

In a first price auction of a single good with reserve price $r$ , the seller observes the bids of participating buyers, and then allocates the good to the buyer with highest bid if that bid is above or equal to the reserve price. The winning buyer’s payment is equal to their bid.

2.1 Heterogeneous lookahead behavior

We define heterogeneous forward-looking behavior of buyers by considering buyers who may be myopic or $k$ -lookahead for different values of $k$ . A myopic ( $k$ -lookahead) buyer is defined as a buyer who optimizes her myopic ( $k$ -lookahead) utility in every round to decide the bid. Below, we give precise definitions of these. Intuitively, myopic buyers optimize their current round utility, while $k$ -lookahead buyers ( $k\geq 1$ ) optimize their total expected utility over the current and next $k$ rounds.

Definition 7 (Buyer’s myopic utility).

Under bidding strategies ${\bf s}=\{s_{j}(\cdot)\}_{j=1,\ldots,n}$ , the myopic utility of buyer $i$ in round $t$ , given private valuation $v_{i,t}$ and history $H_{t-1}$ , is defined as

[TABLE]

Definition 8 ( $k$ -lookahead utility).

*Under bidding strategies ${\bf s}=\{s_{j}(\cdot)\}_{j=1,\ldots,n}$ , the $k$ -lookahead utility of buyer $i$ in round $t$ , given private valuation $v_{i,t}$ and history $H_{t-1}$ , is defined as *

[TABLE]

where $W_{t+r}$ denotes the vector of realized bids, allocation and payments in round $t+r$ , $W_{[t,t+r-1]}:=\{W_{t},\ldots,W_{t+r-1}\}$ . And ${\bf v}^{[t+1,t+k]}$ denotes the realizations of private valuations for all bidders from time $t+1$ to $t+k$ , i.e., ${\bf v}^{[t+1,t+k]}=\{v_{\tau,j},\tau=t+1,\ldots,t+k,j=1,\ldots,n\}$ .

Undominated strategies.

We provide revenue guarantees for our mechanism under an assumption that all myopic or $k$ -lookahead buyers play “undominated strategies”. This is a significantly more robust notion of rationality than a Nash equilibrium.

Definition 9 (Dominated strategies for $k$ -lookahead buyers.).

In our setting for $k$ -lookahead buyer, a strategy $s_{i}^{\prime}$ is a dominated strategy at time $t$ under history $H_{t-1}$ if $\exists s_{i}$ such that

[TABLE]

For myopic buyers, above definition applies with $k=0$ .

2.2 Heterogeneous learning behavior

We consider learning buyers as those who do not know (or do not trust) the seller’s mechanism, in particular the seller’s allocation and payment function, in order to be able to precisely evaluate their current and future utility. Instead, a learning buyer uses the past outcomes to learn how to bid. We formalize the notion of learning buyer using the experts learning framework [FS95].

A learning buyer uses a learning algorithm in order to learn to bid in a way that its total utility is close to that achieved by the best single expert among a set of expert bidding strategies $E$ . (Recall from Definition 3, a bidding strategy is an arbitrary mapping from history and valuation to bid). We formalize different levels of learning sophistication among buyers by considering two classes of learning algorithms, as described below.

No-regret learner:

A no-regret learning buyer $i$ uses a no-regret learning algorithm to decide bid $b_{i,t}$ at time $t$ . The ‘reward’ (in no-regret learning terminology) at time $t$ on making a bid $b_{i,t}=b$ is given by the buyer’s $t$ th-round utility, determined by the mechanism’s output depending on other buyers’ bids as well as the history. That is, on making a bid $b$ , the learner’s reward at time $t$ is given by a function $g_{t}(b)$ defined as

[TABLE]

Regret is defined as the difference between buyer’s total reward and that of the best expert $f\in E$ in hindsight:

[TABLE]

A no-regret learning buyer uses a bidding strategy such the above regret is $o(T)$ under every trajectory of bids and private valuations. Note that such a learner is solving an adversarial bandit problem, since the learner only observes the value of function $u_{i,t}(\cdot)$ on the bid $b_{i,t}$ used by the buyer. When the number of experts $N$ is finite, there are efficient and natural algorithms (e.g., EXP3 algorithm based on multiplicative weight updates [ACBFS03]) that achieve $O(\sqrt{NT\log N})$ regret.

No-policy-regret learner:

This more sophisticated buyer uses a no-policy-regret learning algorithm (following definition of policy regret in [ADT12]). An important distinction from the definition of regret in the previous paragraph is that now the total reward of the best expert must be evaluated over the trajectory of adversarial inputs (i.e., history and other buyers’ bids) in response to the bids made by the expert. To make explicit the dependence of $t^{th}$ utility reward on the trajectory of past decisions through history of outcomes and other buyers’ strategic response, let us denote the reward function for round $t$ as $g_{t}(b,H_{t-1})$ . Let $s_{j,t}$ denote the strategy used by buyer $j\neq i$ at time $t$ and $v_{j,t}$ is the private valuation of buyer $j$ . Then, the learner’s expected reward in round $t$ is defined as:

[TABLE]

Then, for any sequence of other buyers’ valuations ${\bf v}_{-i,t}$ and strategies ${\bf s}_{-i,t}$ for $t=1,\ldots,T$ policy-regret of such a buyer is defined against the best expert $f\in E$ in hindsight:

[TABLE]

where $b_{i,t}$ denotes the bid made by buyer at time $t$ , $f(H^{\prime}_{i,t},v_{i,t})$ denotes the bid that would be made by the expert under counterfactual trajectory, ${H}^{\prime}_{1},\ldots,{H}^{\prime}_{T}$ denotes the (possibly randomized) counterfactual trajectory of history that would be observed in response to using the bids suggested by the expert, instead of the original bids $b_{i,t}$ . A no-policy-regret learning buyer uses a bidding strategy such that the above policy-regret is guaranteed to be $o(T)$ under any sequence of other buyers’ valuations ${\bf v}_{i,t}$ and strategies ${\bf s}_{-i,t}$ for $t=1,\ldots,T$ . See Appendix F for a short note on the existence of policy regret learning algorithms.

3 Repeated First Price Auction Mechanism

Algorithms 1-3 contain the formal description of our mechanism. We design a dynamic first price auction mechanism that is conducted in sequential rounds $t=1,\ldots,T$ . In every round, the mechanism partitions the $n$ buyers into two categories: good state buyers and bad state buyers. All buyers start in good state. The buyers may be moved by the mechanism from good state to bad state over time but once in bad state, a buyer remains there for the remaining rounds.666One can think of the time period $T$ as the number of auctions in a day, and reset all buyers to good state at the beginning of the next day when another $T$ auctions are run. Alternatively it also possible to design a mechanism that permits the movement back to a good state for initial few rounds, but for clarity in exposition, so we choose to not do that here. In any round, the current set of buyers in good state and bad state are denoted by $G$ and $B$ respectively. The mechanism uses these states to track buyer behavior and incentivize lookahead or learning buyers to stay in a good state, in order to extract the desired revenue from both sophisticated and naive buyers.

The mechanism proceeds in epochs, each consisting of multiple rounds. In a given round, the mechanism either conducts a first price auction with a reserve price among the good state buyers, or a first price auction with a (different) reserve price among the bad state buyers. Specifically, an epoch consists of $\mathcal{E}:=\frac{2Hm_{g}}{(1-\delta)(1-\rho)}=O(m_{g})$ rounds where $H=\frac{4\log(1/\epsilon)}{\delta^{2}}$ and $m_{g}=\max(1,|G|)$ is the number of good state buyers in the beginning of the epoch. During an epoch, the mechanism first runs the Bad-State-Auctions subroutine (Algorithm 2), which consists of $\rho\mathcal{E}$ rounds of auctions among bad state buyers. It then runs the Good-State-Auctions subroutine (Algorithm 3), which consists of $(1-\rho)\mathcal{E}$ rounds of auctions among good state buyers. Some good state buyers may be moved to bad state during the Good-State-Auctions. The reserve prices $r^{g}$ and $r^{b}$ for good and bad state auctions depend on parameters $m_{g},m_{b}\in[n]$ which are set at the beginning of an epoch, and remain fixed throughout an epoch.

In the first $\mathcal{E}^{b}=\rho\mathcal{E}$ rounds of the epoch, Bad-State-Auctions runs a first price auction with reserve price $r^{b}$ among bad state buyers in $B$ , as described in Algorithm 2. Here $r^{b}:=p_{m_{b}}-\frac{\epsilon}{n}q_{m_{b}}$ , with $p_{m_{b}}=F^{-1}(1-\theta_{m_{b}})$ , $\theta_{m}$ being the probability that a buyer wins in a Myerson auction with $m$ buyers (e.g., see [Mye81b]); and $q_{m_{b}}:=F^{-1}(1-1/m_{b})$ the $m_{b}^{th}$ quantile. In fact, for any $m$ , $\theta_{m}\leq\frac{1}{m}$ , so that $p_{m_{b}}\geq q_{m_{b}}$ and $r^{b}\geq(1-\frac{\epsilon}{n})p_{m_{b}}\geq(1-\frac{\epsilon}{n})q_{m_{b}}$ .

In each of the remaining $\mathcal{E}^{b}=(1-\rho)\mathcal{E}$ rounds of the epoch, the Good-State-Auctions subroutine (Algorithm 3) runs a first price auction with reserve price $r^{g}:=(1-\epsilon)q^{\dagger}_{m_{g}}$ among buyers currently in good state (i.e., buyers in $G$ ). Here, $q^{\dagger}_{m}:=\mathbb{E}_{v\sim F}[v|v\geq q_{m}]$ , with $q_{m}=F^{-1}(1-1/m)$ .

The Good-State-Auctions subroutine uses a third state, called the rest state, which is used to temporarily rest a buyer, i.e., not allow that buyer to participate in the remaining auctions in that epoch. The set of buyers in the rest state in the current round is denoted by $R$ . In each round, after the auction a buyer may be moved from good state to either bad or rest state in the following ways:

•

The mechanism considers the number of uncleared auctions $U$ so far in this epoch, i.e., the number of auctions during this instance of Good-State-Auctions, where all the participating bids were lower than reserve price. If this number is greater than or equal to $\frac{m_{g}H}{(1-\delta)}$ , then any good state buyer ( $i\in G$ ) whose bid $b_{i}$ in this round was smaller than the reserve price, is moved to bad state.

•

For every buyer currently in good state ( $i\in G$ ), the mechanism considers the number of allocations $A_{i}$ received by the buyer so far in this epoch. If this is $\geq H$ , the buyer is moved to the rest state.

In the first step, the mechanism checks if the number of uncleared auctions is significantly above the statistically expected number. And, if so, from there on, the mechanism punishes every participating buyer who bids below reserve price. This step is aimed to ensure that the lookahead buyers are incentivized to bid above reserve price in rounds where their private valuations are high enough. To understand the intuition behind the second step, observe that given the epoch length of $O(m_{g})$ , statistically, any given buyer is expected to have the highest valuation among $m_{g}$ buyers for roughly a constant number of steps in every epoch. Thus, the second step is intended to ensure that a buyer does not win too many auctions (perhaps at cost of negative immediate utility for some rounds) in order to deprive other buyers of allocations and potentially cause the mechanism to move them to bad state.

4 Revenue Analysis: Main Result

Our main result is that the mechanism presented in the previous section extracts a constant factor of optimal revenue $q^{\dagger}_{n_{\text{soph}}}+\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ from $n_{\text{soph}}$ sophisticated buyers and $n_{\text{naive}}$ naive buyers. To formally state this result, we first define the sophisticated buyers and naive buyers. This involves defining the set of experts used by learning buyers. In general, an expert bidding strategy can be any arbitrary mapping from historical information and current valuation to bid. However, being able to learn the best strategy in such an arbitrary set makes too strong an assumption on learning abilities of the buyer. In fact, it is sufficient for our mechanism to have learning buyers that can compete against a restricted set of experts, as defined below.

Definition 10 (Expert set $E$ ).

Let $h_{i,t-1}$ be a fixed size projection of history $H_{t-1}$ containing the following information: the buyer $i$ ’s state (whether it is bad or good/rest) in round $t$ , the number of buyers in good and bad state, and the number of uncleared auctions so far in the current epoch. Let ${\cal H}$ denote the set of ( $2\times n\times\mathcal{E}_{\text{max}}$ ) possible values of $h_{i,t},\forall i,t$ . Then, set of experts $E$ is defined as mappings from this projected history and current valuation to a bid, i.e., $E=\{f:{\cal H}\times\bar{V}\rightarrow\bar{V}\}.$

Here, $\bar{V}$ is a discretized (to arbitrary accuracy) range of valuations, in order to obtain a finite set of experts. Given $h_{i,t-1}$ , and valuation $v_{i,t}$ , an expert strategy $f\in E$ suggests bid $b_{i,t}=f(h_{i,t-1},v_{i,t})$ to buyer $i$ in round $t$ .

We are now ready to formally define sophisticated and naive buyers.

Definition 11 (Sophisticated and Naive buyers).

Sophisticated buyers are defined as the buyers who are either $k$ -lookahead, for $k\geq\frac{80\log(\epsilon^{-1})n}{\epsilon^{3}(1-\epsilon)^{2}(1-\rho)}=\Theta(n)$ , or no-policy-regret learners against some set of experts containing $E$ . Naïve buyers are defined as buyers who are either myopic, or are no-regret learners against some set of experts containing $E$ .

{rThm}

[]

Assuming all myopic and $k$ -lookahead buyers play undominated strategies, the expected per round revenue of the mechanism described in Algorithm 1-3, with $\epsilon\in(0,1)$ , $\delta=\epsilon$ , $\rho\leq\frac{\epsilon(1-\epsilon)^{4}}{12}$ , is at least

[TABLE]

where $n_{\text{soph}},n_{\text{naive}}$ is the number of sophisticated and naive buyers, respectively. More precisely, the expected per round revenue is at least

[TABLE]

where $q^{\dagger}_{n}=\mathbb{E}_{v\sim F}[v|v\geq q_{n}]$ , with $q_{n}=F^{-1}(1-1/n)$ being the $n^{th}$ quantile for the valuation distribution, and $q^{\dagger}_{0}=0$ . $\operatorname{\mathrm{Rev}}^{\operatorname{\mathrm{Mye}}}(n)$ is the optimal revenue in a single-item auction with $n$ buyers.

The proof of Theorem 11 consists of four parts. We first give revenue and utility bounds that apply to any buyer, then characterize undominated strategies for myopic/lookahead buyers, and no-regret strategies for learners, and finally combine these parts. We give an overview of each part here and defer lemma statements and their proofs to the appendix.

General revenue and utility analysis (Appendix B).

We first establish in Lemma 1 that, if $G$ is the set of good state buyers at the end of an epoch, then the expected revenue from Good-State-Auctions during that epoch is at least $|G|Hr^{g}$ . This lower bound on revenue from good state buyers is obtained by observing that each buyer who ends an epoch in good state must have either been allocated the item (and paid at least $r^{g}$ ) during at least $H$ rounds of this epoch to be moved to a rest state, or must have bid at least $r^{g}$ at each round where $U\geq m_{g}H/(1-\delta)$ . In Lemma 2, we show that if the bid of bad state buyers $B$ is at least $r^{b}$ when their value is larger than $r^{b}$ , then the expected revenue per round of Bad-State-Auctions is at least $(1-\epsilon)(1-1/e)\frac{|B|}{m_{b}}\mathrm{Rev}^{\mathrm{Mye}}(m_{b})$ .

A main part of the overall revenue analysis is to argue that sophisticated buyers are incentivized to remain in good state, irrespectively of other buyers’ bids. To show this, we establish a lower bound on the utility achievable in good state and an upper bound on the utility achievable in bad state. To establish the lower bound, we analyze a strategy called the good strategy $s^{g}$ (Definition 12) that never moves a buyer to the bad state. When $U<m_{g}H/(1-\delta)$ , $s^{g}$ bids $r^{g}=(1-\epsilon)q^{\dagger}_{m_{g}}$ if $v_{i}\geq q_{m_{g}}$ , and [math] otherwise. When $U\geq m_{g}H/(1-\delta)$ , $s^{g}$ bids $r^{g}$ . In Lemma 12, we lower bound the expected utility obtained by strategy $s^{g}$ over an epoch. The crucial component of the mechanism which allows this bound is temporarily moving buyers who have already been allocated enough (more than statistically expected) number of items to the rest state. This temporarily removes such buyers from good state auctions, and guarantees to any buyer $i\in G$ a minimum number of rounds in each epoch where $i$ can get the item if it bids above the reserve price. In Lemma 3, we upper bound the expected utility achievable by any bad state buyer.

Undominated strategies for buyers with heterogeneous lookahead attitudes (Appendix C).

The main lemma for this part (Lemma 4) shows that a $k$ -lookahead buyer, for $k$ large enough, never enters the bad state. To show this, we consider a round where a buyer $i\in G$ faces the threat to be sent to a bad state if it bids below $r^{g}$ . In Lemma 6, we lower bound the $k$ -lookahead utility obtained by strategy $s^{g}$ , which maintains $i$ in good state, in such a round. Lemma 5 then upper bounds the $k$ -lookahead utility of any strategy bidding below $r^{g}$ in such a round, which would send the buyer to a bad state. Lemma 4 then combines these two bounds to show that any strategy that sends a $k$ -lookahead buyer to a bad state is dominated by strategy $s^{g}$ . A main difficulty in combining these two lemmas is that the epoch lengths and the reserve prices vary at each epoch, and we need to compare utilities from different epochs. We show in Lemma 8 that a myopic buyer bids at least the reserve price $r^{b}$ when it has value at least $r^{b}$ in bad state.

Strategies of no-regret buyers with heterogeneous learning behaviors (Appendix D).

We show in Lemma 9 that a buyer that goes to a bad state has high policy-regret compared to an expert that plays strategy $s^{g}$ , which implies that a no-policy regret learner must remain in good state in all but $o(T)$ rounds. A difficulty here is to argue that there is gap between the utility a buyer going to a bad state and the utility of an expert following the good strategy, where the utilities are evaluated over different trajectories of adversarial inputs. In Lemma 10, we give a condition under which a no-regret learner in bad state must bid at least the reserve price $r^{b}$ when its value is larger than $r^{b}$ in all but $o(T)$ rounds. An important subtlety for no-regret learners is that due to the other buyers, a learner is not guaranteed to win a bad state auction and obtain positive utility when it bids at least $r^{b}$ .

Main result (Appendix E).

We combine the three previous parts to lower bound the revenue achieved by the mechanism and obtain Theorem 11. A last non-trivial argument needed is that if a naive buyer remains in good state, we obtain at least a much revenue from that buyer as if it was in bad state, regardless of how many buyers are in good and bad state.

Appendix A Proof for Revenue Upper Bound

Upper bound on revenue.

In a setting with just $n_{\text{naive}}$ naive buyers and no other buyers (for e.g., just $n_{\text{naive}}$ myopic buyers), it is impossible to get more than $\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ per round, i.e., the optimal revenue in a single-round auction obtainable from Myerson’s auction [Mye81a]. In a setting with just $n_{\text{soph}}$ sophisticated buyers (and no other buyers), it is impossible to get more than $\operatorname{\mathbb{E}}_{v_{1}\sim F,\dots,v_{n_{\text{soph}}}\sim F}\left[\max(v_{1},\dots,v_{n_{\text{soph}}})\right]$ as revenue is upper bounded by the maximum valuation. It is easy to show that $\operatorname{\mathbb{E}}_{v_{1}\sim F,\dots,v_{n_{\text{soph}}}\sim F}\left[\max(v_{1},\dots,v_{n_{\text{soph}}})\right]\leq q^{\dagger}_{n_{\text{soph}}}=\operatorname{\mathbb{E}}_{v\sim F}\left[v|v\geq F^{-1}(1-1/n_{\text{soph}})\right]$ . To see this, note that because a buyer has the largest value among $n_{\text{soph}}$ buyers (with ties broken uniformly) with probability $\frac{1}{n_{\text{soph}}}$ ,

[TABLE]

Now, since the expected value of a buyer conditioned on an event happening with probability $1/n_{\text{soph}}$ is at most $\operatorname{\mathbb{E}}_{v\sim F}\left[v|v\geq F^{-1}(1-1/n_{\text{soph}})\right]=q^{\dagger}_{n_{\text{soph}}}$ , the inequality follows.

Now, combining these, we claim that in a setting with $n_{\text{soph}}$ sophisticated buyers and $n_{\text{naive}}$ naive buyers, the total revenue achievable is at most $q^{\dagger}_{n_{\text{soph}}}+\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ . Indeed, if we were able to achieve more than this, then either the revenue contribution from the sophisticated buyers is more than $q^{\dagger}_{n_{\text{soph}}}$ or the naive buyers is more than $\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ — neither of this is possible because if that was true then in a setting with just the $n_{\text{soph}}$ sophisticated buyers or just the $n_{\text{naive}}$ naive buyers we could have simulated the rest of the buyers by adding dummy buyers and discarded the revenue contributed by dummy buyers to obtain more revenue than $q^{\dagger}_{n_{\text{soph}}}$ or $\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ .

Appendix B General Revenue and Utility Analysis

The analysis shows that, up to constant factors, the mechanism extracts the optimal $q^{\dagger}_{n_{\text{soph}}}+\mathrm{Rev}^{\mathrm{Mye}}(n_{\text{naive}})$ revenue from $n_{\text{soph}}$ sophisticated buyers and $n_{\text{naive}}$ naïve buyers. In Section B.1, we first provide separate bounds on the revenue from good and bad state buyers in Lemma 1 and Lemma 2 respectively.

The main part of the analysis is to argue that sophisticated buyers, either $k$ -lookahead buyers for large enough $k$ or no-policy-regret learners, are incentivized to remain in the good state. We show that any strategy which leads to the bad state is a dominated strategy for $k$ -lookahead buyers (and has large regret for no-policy regret learners). To show this, in Section B.2, we provide lower and upper bounds on the utility achievable by a buyer in the good and bad state in Lemma 12 and Lemma 3, respectively.

These utility bounds are used Section C and Section D to argue that (a) $k$ -lookahead buyers for large enough $k$ (Lemma 4) and no-policy-regret buyers (Lemma 9) have incentive to stay in good state for most rounds, and (b) in bad state, myopic buyers (Lemma 8) and learning buyers (Lemma 10) have incentive to bid above reserve price when their private valuation is large enough.

Finally in Section E, we combine all these observations to lower bound the revenue achieved given a pool of heterogeneous lookahead and learning buyers, and prove our main result.

B.1 Revenue analysis from good and bad state auctions

We give bounds on the revenue achieved by Good-State-Auctions and Bad-State-Auctions. These are general bounds which hold for both lookahead and learning buyers.

Lemma 1.

Let $G$ be the set of good state buyers at the end of an epoch. Then, total expected revenue from Good-State-Auctions during that epoch is at least $|G|H(1-\epsilon)q^{\dagger}_{m_{g}}$ .

Proof.

Let $G$ be the good state buyers at the end of an epoch of the mechanism. Consider the round during that epoch at which the number $U$ of uncleared auctions reaches $\frac{m_{g}H}{1-\delta}$ . Such a step must exist because total number of allocations is at most $m_{g}H$ ( $m_{g}$ is the number of good state buyers at the beginning of that epoch) but the number of good state auctions in this epoch is $(1-\rho)\mathcal{E}=\frac{2m_{g}H}{1-\delta}$ . Suppose that at this round, some buyer in $G$ is in good state (i.e., hasn’t yet been moved to rest state). Every such buyer must bid $r^{g}$ or above in each of the remaining rounds of the epoch, until that buyer is moved to the rest state; otherwise, the mechanism would have pushed this buyer to a bad state and this buyer would not be in $G$ at the end of this epoch. This means that before the end of the epoch: either all of the buyers in $G$ were moved to rest state so that revenue was at least $|G|Hr^{g}$ ; or all the remaining auctions (after the number of uncleared auctions reached the threshold) cleared because some bid exceeded reserve price, so that the number of uncleared auctions is bounded by the threshold $\frac{m_{g}H}{1-\delta}$ , and the number of goods sold through the good state auctions is at least $(1-\rho)\mathcal{E}-\frac{m_{g}H}{1-\delta}\geq m_{g}H$ , giving revenue of at least $r^{g}|G|H$ . ∎

Revenue from bad state auctions.

Lemma 2.

*Let $B$ be set of bad state buyers in the beginning of an epoch. Suppose that in every round of Bad-State-Auctions during this epoch where the set of buyers $i\in B$ with $v_{i,t}\geq p_{m_{b}}$ is non-empty, at least one such buyer is guaranteed to bid $r^{b}$ or more. Then, total expected revenue from that epoch is at least $(1-\epsilon)(1-\frac{1}{e})\cdot\frac{|B|}{m_{b}}\operatorname{\mathrm{Rev}}^{\operatorname{\mathrm{Mye}}}(m_{b})\mathcal{E}^{b}$ , where expectation is taken over valuations $v_{i,t},i\in B,t\in\mathcal{E}^{b}$ . *

Proof.

Recall $r^{b}=p_{m_{b}}-\frac{\epsilon}{n}q_{m_{b}}\geq(1-\epsilon)p_{m_{b}}$ . If in every round, among buyers in bad state with valuation above $p_{m_{b}}$ , at least one buyer is guaranteed to bid above $r^{b}\geq(1-\epsilon)p_{m_{b}}$ , then the mechanism will get at least $(1-\epsilon)$ fraction of the expected revenue of a posted-price mechanism with a uniform price $p_{m_{b}}$ among buyers in $B$ . Since $\theta_{m_{b}}$ is the probability of a buyer winning an $m_{b}$ buyer Myerson’s auction with iid values, where $m_{b}\geq|B|$ , this revenue (see for example [CHMS10]) is at least $(1-\epsilon)(1-1/e)\frac{|B|}{m_{b}}\cdot\operatorname{\mathrm{Rev}}^{\operatorname{\mathrm{Mye}}}(m_{b})$ for every round in bad state auctions. ∎

B.2 Utility analysis from good and bad state buyers

We give bounds on the utility achievable by a buyer in a good state and bad state. These are general bounds which hold both for buyers with lookahead attitudes and the buyers with learning behaviors. Further, they hold irrespective of other buyers’ bids.

Lower bound on the utility achievable by a good state buyer.

We lower bound the utility achievable by a buyer in the good state by describing a simple strategy which (1) never moves a buyer to the bad state and (2) achieves high utility. We call this strategy the ‘good strategy’, denoted by $s^{g}$ , and defined as follows.

Definition 12 (Good strategy $s^{g}$ ).

In any round of Good-State-Auctions, a buyer $i$ using good strategy $s^{g}$ bids in the following manner. If the number of the uncleared past auctions in the current epoch is $U<m_{g}H/(1-\delta)$ , bid $r^{g}=(1-\epsilon)q^{\dagger}_{m_{g}}$ if the current valuation $v_{i}\geq q_{m_{g}}$ , and bid [math], otherwise. If $U\geq m_{g}H/(1-\delta)$ , set the bid equal to the reserve price $r^{g}$ irrespective of the current valuation.

Observe that the good strategy is defined in such a manner that a buyer using this strategy for all rounds of Good-State-Auctions is guaranteed to either always stay in the good state or be moved to the rest state, i.e., is guaranteed to be never pushed to a bad state. Next, we now lower bound the utility of a buyer using this strategy.

{rLem}

[]

Consider a buyer in good state at the beginning of an epoch. Irrespective of the other buyers’ bids, the expected utility of strategy $s^{g}$ over that epoch is at least

[TABLE]

Proof.

Consider an epoch where a good state buyer follows strategy $s^{g}$ for all rounds in Good-State-Auctions. Similarly to the proof of Lemma 1, observe that the number of uncleared auction in such an epoch must reach $\frac{1}{(1-\delta)}m_{g}H$ before the end of this epoch. Let $e_{1}$ denote the event that the buyer is still in good state when the number of uncleared auction in that epoch reaches $\frac{1}{(1-\delta)}m_{g}H$ , i.e., the buyer did not get enough allocations to go to rest state. Since the buyer bids $r^{g}=(1-\epsilon)q^{\dagger}_{m_{g}}$ whenever $v\geq q_{m_{g}}$ , probability $\Pr({e_{1}})$ of this event happening is bounded by the probability of the following event: let $X_{1},X_{2},\ldots,X_{r}$ be $r=\frac{m_{g}H}{1-\delta}$ independent samples from distribution $F$ ; then consider event $\sum_{i=1}^{r}I(X_{i}\geq F^{-1}(1-1/m_{g}))\leq H$ . Since $\mathbb{E}[\sum_{i=1}^{r}I(X_{i}\geq F^{-1}(1-1/m_{g}))]=\frac{r}{m_{g}}$ , Therefore, using Chernoff bounds, probability of this event is bounded as $\Pr({e_{1}})\leq e^{-\delta^{2}H/2}$ . Under event $e_{1}$ , the buyer may end up with negative utility (as the strategy of always bidding reserve price will kick in), which can be at worst $-r^{g}H$ from this epoch. Otherwise, the buyer will get $H$ allocations, each from some rounds where $v\geq q_{m_{g}}$ . Since the buyer wins them at reserve price, and the space of other buyers’ bidding strategies consist only of functions that are independent of this buyer’s bid and valuation, the expected utility from each of these $H$ goods is $\mathbb{E}[v-(1-\epsilon)q^{\dagger}_{m_{g}}|v\geq q_{m_{g}}]\geq\epsilon q^{\dagger}_{m_{g}}$ .

Then, expected utility from each of of the next $K-1$ epochs is at least:

[TABLE]

∎

Upper bound on utility achievable by a bad state buyer.

Lemma 3.

Consider a buyer in bad state at the beginning of an epoch. Irrespective of the other buyers’ bids, its expected utility over the epoch is at most

[TABLE]

Proof.

For an epoch of length $\mathcal{E}$ , the number of rounds of Bad-State-Auctions during an epoch is $\rho\mathcal{E}$ . During a bad state auction, the utility of a buyer with value $v$ is at most $(v-r^{b})I(v\geq r^{b})$ . Thus, the total expected utility of a buyer in bad state over the epoch is at most

[TABLE]

where the first and second inequality were obtained using that $r^{b}=p_{m_{b}}-\frac{\epsilon}{n}q_{m_{b}}$ , and $q_{m_{b}}\leq p_{m_{b}}$ . The first part of third inequality used $v\leq v-r^{b}$ , and by definition $q^{\dagger}_{m_{b}}=\mathbb{E}[v|v\geq q_{m_{b}}]$ , $\Pr(v\geq q_{m_{b}})=\frac{1}{m_{b}}$ . The second part used $q_{m_{b}}\leq q^{\dagger}_{m_{b}},n\geq m_{b}$ . ∎

Appendix C Undominated Strategies for Buyers with Heterogeneous Lookahead Attitudes

We characterize the undominated strategies of myopic and lookahead buyers. The main lemma in this section, Lemma 4, argues that a $k$ -lookahead buyer, for $k$ large enough, playing an undominated strategy never enters a bad state. Lemma 8 then shows that in a bad state, myopic buyers playing undominated strategies always bid at least the reserve price $r^{b}$ when their value is at least $r^{b}$ .

Lookahead buyers.

We previously showed in Lemma 1 that we obtain the optimal $|G|q^{\dagger}_{m_{g}}$ revenue from buyers in a good state. In addition to obtaining the optimal revenue, we show that the mechanism also incentivizes lookahead buyers to stay in a good state. We denote the maximum length of an epoch by $\mathcal{E}_{\text{max}}=\max_{m_{g}}\frac{2Hm_{g}}{(1-\delta)(1-\rho)}=\frac{2Hn}{(1-\delta)(1-\rho)}$ .

Lemma 4.

A $k$ -lookahead buyer with $k\geq\frac{10}{\epsilon(1-\epsilon)}\mathcal{E}_{\text{max}}$ playing an undominated strategy never enters the bad state.

The mechanism moves a buyer $i$ to the bad state if a buyer bids lower than $r^{g}$ in a round of Good-State-Auctions where the number of previous uncleared auctions is $U\geq m_{g}H/(1-\delta)$ . We show that during a round of Good-State-Auctions with $U\geq m_{g}H/(1-\delta)$ , for every $k$ -lookahead buyer $i$ in good state $G$ , bidding below the reserve price $r^{g}$ is a dominated strategy.

A strategy which dominates bidding below the reserve price in such a case is the good strategy $s^{g}$ from Definition 12. Recall that if $U<m_{g}H/(1-\delta)$ , this strategy bids $r^{g}=(1-\epsilon)q^{\dagger}_{m_{g}}$ if the valuation $v_{i}$ is greater than or equal to $q_{m_{g}}$ , and [math], otherwise; if the number of uncleared past auctions in this epoch is $\frac{1}{(1-\delta)}m_{g}H$ or more, it sets the bid equal to the reserve price $r^{g}$ for every value. Under this strategy, the buyer is guaranteed to always stay in good state or be moved to rest state, i.e., guaranteed to never be pushed to a bad state.

Lemma 5 and Lemma 6 upper and lower bound the $k$ -lookahead utility achieved by bidding below the reserve price and by playing $s^{g}$ during a round of Good-State-Auctions with $U\geq m_{g}H/(1-\delta)$ . Then, by combining these bounds, we get that bidding below the reserve price in such a situation is dominated and obtain Lemma 4. We denote by $\ell(t)$ the epoch at which step $t$ occurs.

Lemma 5.

Consider a round $t$ during Good-State-Auctions where the number of previous uncleared auctions is $U\geq nH/(1-\delta)$ . Then, for any buyer $i\in G$ and strategy $s_{i}$ which bids lower than $r^{g}$ in this round,

[TABLE]

Proof.

By the definition of Good-State-Auctions, bidder $i\in G_{t}$ is moved to bad state at the end of step $t$ if it bids lower than $r^{g}$ . Since bidder $i$ bids lower than the reserve price, its utility in this round is [math].

Therefore, under this strategy, irrespective of the bids used in the rounds after $t$ , the buyer $i$ ’s $k$ -lookahead utility in round $t$ is at most the bad state utility over $\ell(t+k)-\ell(t)-1$ epochs, which by Lemma 3, is at most

[TABLE]

∎

Next, we lower bound the $k$ -lookahead utility achieved with the good strategy $s^{g}$ in the same situation.

Lemma 6.

Consider a round $t$ during Good-State-Auctions where the number of previous uncleared auctions is $U\geq nH/(1-\delta)$ . Then, for any buyer $i\in G$ playing strategy $s_{i}=s^{g}$ ,

[TABLE]

Proof.

The utility from bad state auctions is always non-negative, so we ignore that for the lower bound. Now, since any buyer can be allocated at most $H$ items in an epoch, the maximum payment in the remaining rounds of this epoch is at most $r^{g}H$ , lower bounding the utility by $-r^{g}H=-q^{\dagger}_{m^{\ell}_{g}}(1-\epsilon)H$ . Now, consider the utility in any of the next $\ell(t+k)-\ell(t)-2$ epochs. The buyer always remains in good state or rest state in each of the next $\ell(t+k)-\ell(t)-2$ epochs. Combining the above argument with Lemma 12, the $k$ -lookahead utility of the new strategy is at least:

[TABLE]

∎

The third and last lemma needed for the proof of Lemma 4 is used to combine the two previous lemmas and argue that Good-State-Auctions lead higher utility for a buyer then Bad-State-Auctions. This lemma will also be used to argue that Good-State-Auctions give higher revenue.

Lemma 7.

For any $m_{g},m_{b}$ such that $1\leq m_{g}\leq n$ and $n/2\leq m_{b}\leq n$ , then

[TABLE]

Proof.

First, for any $1\leq m_{1}\leq m_{2}\leq n$ , we have

[TABLE]

We obtain

[TABLE]

where the first inequality is since $m_{g}\leq\max(n/2,m_{g})$ , the second is since $q^{\dagger}_{i}$ is increasing in $i$ , and the last since $n/2\leq m_{b}$ . ∎

We now prove that entering a bad state is a dominated strategy for lookahead buyers by comparing the bounds obtained by Lemma 5 and Lemma 6.

Proof of Lemma 4.

With $k\geq\frac{10}{\epsilon(1-\epsilon)}\mathcal{E}_{\text{max}}$ , we have $\sum_{j=\ell(t)+1}^{\ell(t+k)-1}\mathcal{E}^{j}\geq\frac{8}{\epsilon(1-\epsilon)}\mathcal{E}_{\text{max}}$ . Comparing the $k$ -lookahead utility bounds obtained in Lemma 5 and Lemma 6, the strategy $s^{g}$ is dominating if

[TABLE]

We show the following three inequalities that when combined give the above inequality:

[TABLE]

We first show inequality (4). By Lemma 7, we have $\frac{1}{m^{j}_{g}}q^{\dagger}_{m^{j}_{g}}\geq\frac{1}{2}\cdot\frac{1}{m^{j}_{b}}q^{\dagger}_{m^{j}_{b}}$ for all $j\in[\ell(t)+1,\ell(t+k)-1]$ . Inequality 4 then holds by the assumption that $\rho\leq\frac{\epsilon(1-\epsilon)(1-\delta)(1-\rho)}{12(1+\epsilon)}$ .

For inequalities (5) and (6), we first have

[TABLE]

Inequality (5) then holds since $q^{\dagger}_{n}\geq q^{\dagger}_{m^{\ell(t)}_{g}}$ . For inequality (6), note that

[TABLE]

where the second inequality is by Lemma 7 and that $\rho\leq\frac{\epsilon(1-\epsilon)(1-\delta)(1-\rho)}{12(1+\epsilon)}$ . ∎

Lemma 8.

Any myopic buyer $i$ playing an undominated strategy bids $b_{i,t}\geq r^{b}$ if its valuation is $v_{i,t}>r^{b}$ in any round $t$ of Bad-State-Auctions where $i$ is in bad state.

Proof.

Let $B$ be the set of bad state buyers in a round $t$ of Bad-State-Auctions of epoch $\ell$ . We argue that for every myopic buyer $i\in B$ that has private valuation $v_{i,t}>r^{b}$ , bidding below $r^{b}$ is a dominated strategy. A myopic buyer maximizes its utility in the current round $t$ . With bid $b_{i,t}<r^{b}$ , buyer $i$ gets zero utility in round $t$ . If $v_{i,t}>r^{b}$ , bidding $b_{i,t}<r^{b}$ is dominated by the strategy of bidding $b_{i,t}=r^{b}$ , which always obtains non-negative utility in round $t$ and obtains strictly positive utility when other buyers bid below $r^{b}$ . ∎

Appendix D Strategies of No-regret Buyers with Heterogeneous Learning Behavior

Lemma 9.

Every no-policy regret learner must remain in good state in all but $o(T)$ rounds.

Proof.

Here, we use as benchmark expert strategy $s^{g}\in E$ of bidding $r^{g}=(1-\epsilon)q^{\dagger}_{m_{g}}$ in good state auctions whenever $v_{i,t}\geq q_{m_{g}}$ initially, and bidding $r^{g}$ continuously once uncleared auctions reach the limit ( $m_{g}H/(1-\delta)$ ).

This bidding strategy ensures that the buyer is always in a good state, and achieves an expected utility of at least $\epsilon(1-\epsilon)q^{\dagger}_{m^{i}_{g}}H$ during epoch $i$ by Lemma 12. Since $\mathcal{E}=m^{i}_{g}H(1+\frac{1}{1-\delta})\frac{1}{(1-\rho)}\leq\frac{1}{(1-\rho)(1-\delta)}2m_{g}H$ , the expected per round utility is at least

[TABLE]

during that epoch, for some $m^{i}_{g}\in[n]$ . Therefore, since the class $E$ contains such sequences of single experts, the policy-regret learning buyer must achieve at least $\frac{(1-\rho)(1-\delta)\epsilon(1-\epsilon)q^{\dagger}_{m^{i}_{g}}}{2m^{i}_{g}}-o(1)$ utility per round. Now, once in bad state, the buyer can achieve at most

[TABLE]

utility on average by Lemma 3 for some $m^{i}_{b}$ such that $n/2\leq m_{i}^{b}\leq n$ . By Lemma 7, for any $m_{g},m_{b}$ such that $1\leq m_{g}\leq n$ and $n/2\leq m_{b}\leq n$ , then $\frac{1}{m_{g}}q^{\dagger}_{m_{g}}\geq\frac{1}{2}\cdot\frac{1}{m_{b}}q^{\dagger}_{m_{b}}.$

Therefore, if $\rho<(1-\rho)(1-\delta)\epsilon(1-\epsilon)/4-\Omega(1)$ , then the number of bad state epochs in buyer’s state trajectory can be at most $o(T)$ . This implies that the learner remains in a good state for at least $T-o(T)$ rounds. ∎

We use the set of experts $E$ as defined in Definition 10 in the lemmas below.

Lemma 10.

Consider $\Gamma_{i}$ as the set of rounds $t$ where buyer $i$ is participates in a bad state auction, $v_{i,t}>p_{m_{b}}$ and $b_{j,t}<r^{b}$ for all other buyers $j$ in bad state. If buyer $i$ is a no-regret learner against the set of experts $E$ in Definition 10, then the buyer must bid $b_{i,t}\geq r^{b}$ for all but $o(T)$ of rounds in $\Gamma_{i}$ .

Proof.

Consider a bidding expert function $f(h,v)$ defined as $f(h,v)=r^{b}$ when the projected history $h$ indicates that the buyer is in the bad state and valuation $v\geq p_{m_{b}}$ ; and [math] otherwise. This is (or is arbitrarily close in case of discretization) one of the experts in the class $E$ of experts that the buyer is using. Consider any round $t$ where $v_{i,t}\geq p_{m_{b}}$ . On bidding reserve price in this round, if bids $b_{j,t}<r^{b}$ for all the other buyers, then this buyer wins the auction. For a given trajectory of valuations, and other buyers’ bids, $\Gamma_{i}$ denotes the set of such rounds in Bad-State-Auctions; specifically, $\Gamma_{i}=\{t\in[T]:t\in\mathcal{E}^{b}\text{ for some epoch},i\in B\text{ (bad state)},v_{i,t}\geq p_{m_{b}},b_{j,t}<r^{b},j\neq i\}$ . Therefore, the hindsight utility of this expert is at least

[TABLE]

Since the buyer $i$ is using a no-regret learning algorithm, she must be achieving a utility that is within $o(T)$ of the above utility for every trajectory. Now, the buyer cannot make any positive utility in a bad state in round $t$ if $b_{i,t}\leq r^{b}$ . Therefore, on any given trajectory, a no-regret learning buyer must have bid $b_{i,t}\geq r^{b}$ in all but $o(T)$ of rounds in $\Gamma_{i}$ . ∎

A corollary of the above lemma is that if all buyers are learning buyers at most $o(T)$ of bad state auctions where some participating no-regret buyer has valuation above $p_{m_{b}}$ can go uncleared.

Corollary 1.

Consider the set of rounds among bad state auctions where all buyers in bad state are no-regret learners against experts $E$ , and at least one buyer in bad state has valuation of $p_{m_{b}}$ or more. Then, in all but $o(T)$ such rounds, at least one buyer $i$ with $v_{i,t}\geq p_{m_{b}}$ is guaranteed to bid $b_{i,t}\geq r^{b}$ .

Appendix E Proof of the Main Result

We prove the main result, which is a bound on the revenue obtained by the mechanism under a heterogeneous buyer population consisting of an unknown proportion of $k$ -lookahead buyers for different $k$ , myopic buyers, no-policy regret learners, and no-regret learners

Theorem 1.

Let $n_{\text{soph}}$ be the number buyers which are either $k$ -lookahead, for $k\geq\frac{10}{\epsilon(1-\epsilon)}\mathcal{E}_{\text{max}}$ , or no-policy regret learners. Let $n_{\text{naive}}$ be the number of remaining buyers which are either myopic or no-regret learners. Assume Algorithm 1 is instantiated with parameters $\rho,\epsilon,\delta\in(0,1)$ s.t. $\rho\leq\frac{\epsilon(1-\epsilon)(1-\delta)(1-\rho)}{12(1+\epsilon)}$ . If the myopic and lookahead buyers play undominated strategies and the learners play no-regret or no-policy regret strategies, then the expected per round revenue is at least

[TABLE]

Here, $q^{\dagger}_{n}=\mathbb{E}_{v\sim F}[v|v\geq q_{n}]$ , with $q_{n}=F^{-1}(1-1/n)$ being the $n^{th}$ quantile for the valuation distribution, and $q^{\dagger}_{0}=0$ . $\operatorname{\mathrm{Rev}}^{\operatorname{\mathrm{Mye}}}(n)$ is the optimal revenue in a single-item auction with $n$ buyers.

The following corollary can be obtained by simple algebraic manipulations of the result stated in the above theorem.

See 11

Proof of Theorem 11.

Let $n_{\text{soph}}$ be the number of buyers which are either $k$ -lookahead with $k\geq\frac{10}{\epsilon(1-\epsilon)}\mathcal{E}_{\text{max}}$ or no-policy regret learners and $n_{\text{naive}}$ be the remaining buyers (myopic or no-regret learner). We define $R$ to be the collection of rounds where there is either a no-policy regret learner that is in bad state, a no-regret learner that bids $b_{i}<r^{b}$ when $v_{i}>p_{m_{b}}$ .

Consider an epoch $\ell$ where none of the rounds during that epoch are in $R$ and where no buyer is moved from good state to bad state. Let $|G|$ and $|B|$ denote the number of good state and bad state buyers in the beginning of this epoch.

By Lemma 4, all $k$ -lookahead buyers (for $k\geq\frac{10}{\epsilon(1-\epsilon)}\mathcal{E}$ ) remain in good or rest state in all epochs. Since none of the rounds during epoch $\ell$ are in $R$ , every no-policy regret learner remains in good state in epoch $\ell$ by definition of $R$ . Therefore, $|G|=n_{\text{soph}}+n_{\text{naive,g}}$ and $|B|=n_{\text{naive}}-n_{\text{naive,g}}$ for some $n_{\text{naive,g}}\geq 0$ . By Lemma 1, we get that the expected total revenue from Good-State-Auctions in this epoch is at least $|G|H(1-\epsilon)q^{\dagger}_{m_{g}}$ .

For Bad-State-Auctions, we first recall that there is no-policy regret learner in bad state at epoch $\ell$ . By Lemma 8, every myopic buyer bids at least $r^{b}$ if $v_{i}\geq p_{m_{b}}$ during a round of Bad-State-Auctions. By definition of $R$ , ever no-regret learner bids at least $r^{b}$ if $v_{i}\geq p_{m_{b}}$ during a round of Bad-State-Auctions at epoch $\ell$ . by Lemma 2, the expected revenue of the mechanism from Bad-State-Auctions at epoch $\ell$ is thus at least $(1-\epsilon)(1-1/e)\frac{|B|}{m_{b}}\rho\mathcal{E}\mathrm{Rev}^{\mathrm{Mye}}(m_{b})$ .

The expected total revenue from this epoch is thus at least

[TABLE]

To bound this expected revenue, we consider two cases.

If $|B|\geq n/2$ .

In this case, $m_{b}=|B|$ . Since $m_{g},|G|\leq n/2\leq|B|=n_{\text{naive}}$ , we get

[TABLE]

Since $\mathcal{E}=\frac{2Hm_{g}}{(1-\delta)(1-\rho)}$ and $\rho=\frac{\epsilon(1-\epsilon)(1-\delta)(1-\rho)}{12(1+\epsilon)}$ , this implies that

[TABLE]

Next, since $|B|=m_{b}$ and $n_{\text{naive}}\geq|B|$ , we have

[TABLE]

Since $|G|\geq|G|/2+n_{\text{naive,g}}/2$ , we conclude that the total expected revenue from this epoch in this case is at least

[TABLE]

where the inequality is since $n_{\text{naive}}=n_{\text{naive,g}}+|B|\leq 2(n_{\text{naive,g}}/2+|B|)$ .

If $|B|<n/2$ .

In this case, we have $m_{g}=|G|\geq n/2\geq n_{\text{naive}}/2$ . Wet get

[TABLE]

Since $\mathcal{E}=\frac{2Hm_{g}}{(1-\delta)(1-\rho)}$ and $\rho=\frac{\epsilon(1-\epsilon)(1-\delta)(1-\rho)}{12(1+\epsilon)}$ , this implies that

[TABLE]

Thus, the expected total revenue from this epoch in this case is at least

[TABLE]

Since $|G|\geq n_{\text{soph}}$ , we have $q^{\dagger}_{m_{g}}\geq q^{\dagger}_{n_{\text{soph}}}$ . Since $\mathcal{E}=\frac{2Hm_{g}}{(1-\delta)(1-\rho)}$ , in both cases, the expected per round revenue of epoch $\ell$ is thus at least

[TABLE]

where $q^{\dagger}_{n_{\text{soph}}}=0$ if $n_{\text{soph}}=0$

We considered an epoch $\ell$ where none of the rounds during that epoch are in $R$ and where no buyer was moved from good state to bad state. First, since once a buyer is moved to bad state they remain in bad state for every future epoch, there are at most $n$ epochs where a buyer is moved from good state to bad state. We discount $\mathcal{E}_{\text{max}}\cdot\max r^{g}$ revenue for each of those $n$ epochs. Second, by Lemma 9 and 10, $|R|=o(T)$ . Thus, the number of epochs where there is at least one round during that epoch that is in $R$ is $o(T)$ . We discount $\mathcal{E}_{\text{max}}\cdot\max r^{g}$ revenue for each of those $o(T)$ epochs. We obtain the lower bound on per-round revenue as stated in the theorem statement. ∎

Appendix F Regarding Existence of Policy Regret Learning Algorithm

In general, achieving $o(T)$ policy regret is difficult; [ADT12] show that there exists an adaptive adversary such that any learning algorithm has regret at least $\Omega(T)$ . However, it sufficient for us to consider policy regret learners against a small set of experts $E$ (e.g., $O(1)$ size) and further when other buyers are restricted to not use history beyond $o(T)$ past steps. Under such restrictions a simple learning algorithm is to initially explores every expert for $o(T)$ steps and then use the expert with best performance for remaining time steps. We can show that this simple learning algorithm achieves $o(T)$ policy regret under our mechanism with a small modification of resetting the mechanism once after $o(T)$ steps so that this initial exploration does not hurt buyer’s utility in future rounds. This modification does not affect any of the revenue guarantees provided by our mechanism. Further discussion is provided in Appendix E.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ACBFS 03] Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput. , 32(1):48–77, January 2003.
2[ada] adage.com. Google’s ad manager will move to first-price auction. https://adage.com/article/digital/google-adx-moving-a-price-auction/316894/ .
3[ade] adexchanger.com. Rubicon joins first-price auction club; diageo is latest brand to demand more transparency. https://adexchanger.com/ad-exchange-news/tuesday-12122017// .
4[ADH 16] Itai Ashlagi, Constantinos Daskalakis, and Nima Haghpanah. Sequential mechanisms with ex-post participation guarantees. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC ’16, Maastricht, The Netherlands, July 24-28, 2016 , pages 213–214, 2016.
5[ADMS 18] Shipra Agrawal, Constantinos Daskalakis, Vahab S. Mirrokni, and Balasubramanian Sivan. Robust repeated auctions under heterogeneous buyer behavior. In Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018 , page 171, 2018.
6[ADT 12] Raman Arora, Ofer Dekel, and Ambuj Tewari. Online bandit learning against an adaptive adversary: from regret to policy regret. In ICML . icml.cc / Omnipress, 2012.
7[AS 13] Susan Athey and Ilya Segal. An efficient dynamic mechanism. Econometrica , 81(6):2463–2485, 2013.
8[Bat 05] Marco Battaglini. Long-term contracting with markovian consumers. American Economic Review , 95(3):637–658, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Dynamic First Price Auctions Robust to Heterogeneous Buyers

Abstract

1 Introduction

The setting.

Upper bound on revenue.

Definition 1**.**

Main result.

1.1 Overview of challenges and technical approach

1.2 Related work

Revenue maximization in dynamic auctions.

Repeated interactions with evolving values.

Bargaining, durable goods monopolist and Coase conjecture.

Lookahead search.

2 Preliminaries

Definition 2**.**

Definition 3** (Bidding strategy).**

Definition 4** (Payment and allocation function).**

Definition 5** (Buyer’s realized utility).**

Remark 1**.**

Remark 2**.**

Definition 6** (First price auction with reserve price).**

2.1 Heterogeneous lookahead behavior

Definition 7** (Buyer’s myopic utility).**

Definition 8** (kkk-lookahead utility).**

Undominated strategies.

Definition 9** (Dominated strategies for kkk-lookahead buyers.).**

2.2 Heterogeneous learning behavior

No-regret learner:

No-policy-regret learner:

3 Repeated First Price Auction Mechanism

4 Revenue Analysis: Main Result

Definition 10** (Expert set EEE).**

Definition 11** (Sophisticated and Naive buyers).**

General revenue and utility analysis (Appendix B).

Undominated strategies for buyers with heterogeneous lookahead attitudes (Appendix C).

Strategies of no-regret buyers with heterogeneous learning behaviors (Appendix D).

Main result (Appendix E).

Appendix A Proof for Revenue Upper Bound

Upper bound on revenue.

Appendix B General Revenue and Utility Analysis

B.1 Revenue analysis from good and bad state auctions

Lemma 1**.**

Proof.

Revenue from bad state auctions.

Lemma 2**.**

Proof.

B.2 Utility analysis from good and bad state buyers

Lower bound on the utility achievable by a good state buyer.

Definition 12** (Good strategy sgs^{g}sg).**

Proof.

Upper bound on utility achievable by a bad state buyer.

Lemma 3**.**

Proof.

Appendix C Undominated Strategies for Buyers with Heterogeneous Lookahead Attitudes

Lookahead buyers.

Lemma 4**.**

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Proof of Lemma 4.

Lemma 8**.**

Proof.

Appendix D Strategies of No-regret Buyers with Heterogeneous Learning Behavior

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Corollary 1**.**

Appendix E Proof of the Main Result

Theorem 1**.**

Definition 1.

Definition 2.

Definition 3 (Bidding strategy).

Definition 4 (Payment and allocation function).

Definition 5 (Buyer’s realized utility).

Remark 1.

Remark 2.

Definition 6 (First price auction with reserve price).

Definition 7 (Buyer’s myopic utility).

Definition 8 ( $k$ -lookahead utility).

Definition 9 (Dominated strategies for $k$ -lookahead buyers.).

Definition 10 (Expert set $E$ ).

Definition 11 (Sophisticated and Naive buyers).

Lemma 1.

Lemma 2.

Definition 12 (Good strategy $s^{g}$ ).

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Corollary 1.

Theorem 1.

If $|B|\geq n/2$ .

If $|B|<n/2$ .