Experimenting in Equilibrium

Stefan Wager; Kuang Xu

arXiv:1903.02124·math.OC·July 1, 2020·Manag. Sci.

Experimenting in Equilibrium

Stefan Wager, Kuang Xu

PDF

TL;DR

This paper introduces a mean-field based experimental design method for large-scale stochastic systems with significant cross-unit interference, enabling accurate effect estimation and system optimization in equilibrium.

Contribution

It presents a novel approach combining randomization and lightweight mean-field modeling to estimate effects and optimize parameters in systems with interference.

Findings

01

Effective estimation of small parameter changes in large systems.

02

Enables gradient-based optimization in equilibrium settings.

03

Applicable to platforms optimizing supply-side incentives.

Abstract

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a…

Equations339

P_{i t} = p_{t} + ζ ε_{i t}, ε_{i t} \frac{\buildrel iid}{\sim} {\pm 1}

P_{i t} = p_{t} + ζ ε_{i t}, ε_{i t} \frac{\buildrel iid}{\sim} {\pm 1}

\lim_{n\to\infty}\mathbb{E}\left[\left(D/n-d_{a}\right)^{2}\,\big{|}\,A=a\right]=0,

\lim_{n\to\infty}\mathbb{E}\left[\left(D/n-d_{a}\right)^{2}\,\big{|}\,A=a\right]=0,

\mathbb{P}\left(D/n\notin[d_{a}/2,2d_{a}]\,\big{|}\,A=a\right)=o(1/n),

\mathbb{P}\left(D/n\notin[d_{a}/2,2d_{a}]\,\big{|}\,A=a\right)=o(1/n),

\Omega(d,t)\triangleq\mathbb{E}[S_{i}\,\big{|}\,D=d,T=t],

\Omega(d,t)\triangleq\mathbb{E}[S_{i}\,\big{|}\,D=d,T=t],

∣ l (d, t) ∣ = o (1/ t + 1/ d),

∣ l (d, t) ∣ = o (1/ t + 1/ d),

\begin{split}\omega(x)=\left\{\begin{array}[]{ll}\frac{x-x^{L}}{1-x^{L}},&\quad x\neq 1,\\ 1-\frac{1}{L},&\quad x=1.\end{array}\right.\end{split}

\begin{split}\omega(x)=\left\{\begin{array}[]{ll}\frac{x-x^{L}}{1-x^{L}},&\quad x\neq 1,\\ 1-\frac{1}{L},&\quad x=1.\end{array}\right.\end{split}

\mu^{(n)}_{a}(\pi)\triangleq\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}(P_{i}\,\mathbb{E}_{\pi}\left[\Omega(D,T)\,\big{|}\,A=a\right])\,\big{|}\,A=a\right].

\mu^{(n)}_{a}(\pi)\triangleq\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}(P_{i}\,\mathbb{E}_{\pi}\left[\Omega(D,T)\,\big{|}\,A=a\right])\,\big{|}\,A=a\right].

\mathbb{P}\left[Z_{i}=1\,\big{|}\,P_{i},\,\pi,\,A\right]=\frac{1}{1+e^{-\alpha\left(P_{i}\mathbb{E}_{\pi}\left[\Omega(D,T)\,\big{|}\,A\right]-B_{i}\right)}},

\mathbb{P}\left[Z_{i}=1\,\big{|}\,P_{i},\,\pi,\,A\right]=\frac{1}{1+e^{-\alpha\left(P_{i}\mathbb{E}_{\pi}\left[\Omega(D,T)\,\big{|}\,A\right]-B_{i}\right)}},

U = R (D, T) - i = 1 \sum n P_{i} Z_{i} S_{i},

U = R (D, T) - i = 1 \sum n P_{i} Z_{i} S_{i},

R(d,t)=\left(r(d/t)-l(d,t)\right)t,\quad\mbox{for all $t,d\in\mathbb{R}_{+}$},

R(d,t)=\left(r(d/t)-l(d,t)\right)t,\quad\mbox{for all $t,d\in\mathbb{R}_{+}$},

\begin{split}u_{a}^{(n)}(\pi)&=\frac{1}{n}\mathbb{E}_{n}\left[U\,\big{|}\,A=a\right],\quad\mbox{and}\quad u^{(n)}(\pi)=\mathbb{E}_{n}\left[u_{A}^{(n)}(\pi)\right].\end{split}

\begin{split}u_{a}^{(n)}(\pi)&=\frac{1}{n}\mathbb{E}_{n}\left[U\,\big{|}\,A=a\right],\quad\mbox{and}\quad u^{(n)}(\pi)=\mathbb{E}_{n}\left[u_{A}^{(n)}(\pi)\right].\end{split}

P_{i} = p + ζ ε_{i}, i \in N .

P_{i} = p + ζ ε_{i}, i \in N .

q_{a}^{(n)} (μ) = E [Ω (D, X) ∣ A = a], X \sim Binomial (n, μ) .

q_{a}^{(n)} (μ) = E [Ω (D, X) ∣ A = a], X \sim Binomial (n, μ) .

n \to \infty lim μ_{a}^{(n)} (p) = μ_{a} (p),

n \to \infty lim μ_{a}^{(n)} (p) = μ_{a} (p),

n \to \infty lim q_{a}^{(n)} (μ) = ω (d_{a} / μ),

n \to \infty lim u_{a}^{(n)} (p) = u_{a} (p) = (r (d_{a} / μ_{a} (p)) - p ω (d_{a} / μ_{a} (p))) μ_{a} (p),

n \to \infty lim (q_{a}^{(n)})^{'} (μ) = - ω^{'} (d_{a} / μ) \frac{d _{a}}{μ ^{2}},

\Delta^{(n)}_{a}(p)=q^{(n)}_{a}\left(\mu^{(n)}_{a}(p))\right)\mathbb{E}\left[f^{\prime}_{B_{1}}\left(pq^{(n)}_{a}\left(\mu^{(n)}_{a}(p))\right)\right)\,\big{|}\,A=a\right].

\Delta^{(n)}_{a}(p)=q^{(n)}_{a}\left(\mu^{(n)}_{a}(p))\right)\mathbb{E}\left[f^{\prime}_{B_{1}}\left(pq^{(n)}_{a}\left(\mu^{(n)}_{a}(p))\right)\right)\,\big{|}\,A=a\right].

\frac{d}{d p} μ_{a}^{(n)} (p) = \frac{Δ _{a}^{(n)} ( p )}{1 - p Δ _{a}^{(n)} ( p ) q _{a}^{(n)^{'}} ( μ _{a}^{(n)} ( p ) ) / q _{a}^{(n)} ( μ _{a}^{(n)} ( p )) )} for any n \geq 1 .

\frac{d}{d p} μ_{a}^{(n)} (p) = \frac{Δ _{a}^{(n)} ( p )}{1 - p Δ _{a}^{(n)} ( p ) q _{a}^{(n)^{'}} ( μ _{a}^{(n)} ( p ) ) / q _{a}^{(n)} ( μ _{a}^{(n)} ( p )) )} for any n \geq 1 .

\displaystyle\lim_{n\rightarrow\infty}\Delta^{(n)}_{a}(p)=\Delta_{a}(p)\triangleq\omega\left(d_{a}/\mu_{a}(p)\right)\mathbb{E}\left[f^{\prime}_{B_{1}}\left(p\omega\left(d_{a}/\mu_{a}(p)\right)\right)\,\big{|}\,A=a\right],

\displaystyle\lim_{n\rightarrow\infty}\Delta^{(n)}_{a}(p)=\Delta_{a}(p)\triangleq\omega\left(d_{a}/\mu_{a}(p)\right)\mathbb{E}\left[f^{\prime}_{B_{1}}\left(p\omega\left(d_{a}/\mu_{a}(p)\right)\right)\,\big{|}\,A=a\right],

\displaystyle\lim_{n\to\infty}\frac{d}{dp}\mu^{(n)}_{a}(p)=\mu_{a}^{\prime}(p)=\Delta_{a}(p)\,\Big{/}\,\left(1+\frac{pd_{a}\Delta_{a}(p)\omega^{\prime}\left(d_{a}/\mu_{a}(p)\right)}{\mu_{a}(p)^{2}\omega(d_{a}/\mu_{a}(p))}\right).

R_{a} (p) = Σ_{a}^{Δ} (p) Σ_{a}^{Ω} (p), scaled marginal sensitivity Σ_{a}^{Δ} (p) = \frac{p Δ _{a} ( p )}{μ _{a} ( p )}, scaled matching elasticity Σ_{a}^{Ω} (p) = \frac{d _{a}}{μ _{a} ( p )} \frac{ω ^{'} ( d _{a} / μ _{a} ( p ))}{ω ( d _{a} / μ _{a} ( p ))} .

R_{a} (p) = Σ_{a}^{Δ} (p) Σ_{a}^{Ω} (p), scaled marginal sensitivity Σ_{a}^{Δ} (p) = \frac{p Δ _{a} ( p )}{μ _{a} ( p )}, scaled matching elasticity Σ_{a}^{Ω} (p) = \frac{d _{a}}{μ _{a} ( p )} \frac{ω ^{'} ( d _{a} / μ _{a} ( p ))}{ω ( d _{a} / μ _{a} ( p ))} .

\widehat{\Delta}=\zeta_{n}^{-1}\sum_{i=1}^{n}(Z_{i}-\widebar{Z})(\varepsilon_{i}-\bar{\varepsilon})\,\big{/}\,\sum_{i=1}^{n}(\varepsilon_{i}-\bar{\varepsilon})^{2}.

\widehat{\Delta}=\zeta_{n}^{-1}\sum_{i=1}^{n}(Z_{i}-\widebar{Z})(\varepsilon_{i}-\bar{\varepsilon})\,\big{/}\,\sum_{i=1}^{n}(\varepsilon_{i}-\bar{\varepsilon})^{2}.

Υ = Δ / (1 + \frac{p \widebar D Δ ω ^{'} ( \widebar D / \widebar Z )}{\widebar Z ^{2} ω ( \widebar D / \widebar Z )}),

Υ = Δ / (1 + \frac{p \widebar D Δ ω ^{'} ( \widebar D / \widebar Z )}{\widebar Z ^{2} ω ( \widebar D / \widebar Z )}),

Γ = Υ [r (\widebar D / \widebar Z) - p ω (\widebar D / \widebar Z) - (r^{'} (\widebar D / \widebar Z) - p ω^{'} (\widebar D / \widebar Z)) \widebar D / \widebar Z] - ω (\widebar D / \widebar Z) \widebar Z .

Γ = Υ [r (\widebar D / \widebar Z) - p ω (\widebar D / \widebar Z) - (r^{'} (\widebar D / \widebar Z) - p ω^{'} (\widebar D / \widebar Z)) \widebar D / \widebar Z] - ω (\widebar D / \widebar Z) \widebar Z .

n \to \infty lim P [Γ - \frac{d}{d p} u_{A} (p) > ε] = 0,

n \to \infty lim P [Γ - \frac{d}{d p} u_{A} (p) > ε] = 0,

p_{t + 1} = argmin_{p} {\frac{1}{2 η} s = 1 \sum t s (p - p_{s})^{2} - θ_{t} p : p \in I}, θ_{t} = s = 1 \sum t s Γ_{s} .

p_{t + 1} = argmin_{p} {\frac{1}{2 η} s = 1 \sum t s (p - p_{s})^{2} - θ_{t} p : p \in I}, θ_{t} = s = 1 \sum t s Γ_{s} .

n \to \infty lim P [\frac{1}{T} t = 1 \sum T t (u_{A_{t}} (p) - u_{A_{t}} (p_{t})) \leq \frac{η M ^{2}}{2}] = 1,

n \to \infty lim P [\frac{1}{T} t = 1 \sum T t (u_{A_{t}} (p) - u_{A_{t}} (p_{t})) \leq \frac{η M ^{2}}{2}] = 1,

n \to \infty lim sup P [(p^{*} - \overset{p}{ˉ}_{T})^{2} \leq \frac{η M ^{2}}{σ T} (16 lo g (δ^{- 1}) + 4)] \geq 1 - δ,

n \to \infty lim sup P [(p^{*} - \overset{p}{ˉ}_{T})^{2} \leq \frac{η M ^{2}}{σ T} (16 lo g (δ^{- 1}) + 4)] \geq 1 - δ,

\frac{1}{T} t = 1 \sum T (u_{A_{t}} (p_{t}) - u_{A_{t}} (p_{t}, ζ)) \leq C ζ^{2} for all 0 \leq ζ < α .

\frac{1}{T} t = 1 \sum T (u_{A_{t}} (p_{t}) - u_{A_{t}} (p_{t}, ζ)) \leq C ζ^{2} for all 0 \leq ζ < α .

\mu^{(n)}_{a}(\pi)=\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}\left(\beta(P_{i})q^{(n)}_{a}(\mu^{(n)}_{a}(\pi)))\right)\,\big{|}\,A=a\right].

\mu^{(n)}_{a}(\pi)=\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}\left(\beta(P_{i})q^{(n)}_{a}(\mu^{(n)}_{a}(\pi)))\right)\,\big{|}\,A=a\right].

\mu^{(n)}_{a}(\pi)=\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}\left(s\left(\frac{d_{a}}{\mu^{(n)}_{a}(\pi)}\right)P_{i}\,q^{(n)}_{a}(\mu^{(n)}_{a}(\pi)))\right)\,\big{|}\,A=a\right],

\mu^{(n)}_{a}(\pi)=\mathbb{P}_{\pi}\left[Z_{i}=1\,\big{|}\,A=a\right]=\mathbb{E}_{\pi}\left[f_{B_{i}}\left(s\left(\frac{d_{a}}{\mu^{(n)}_{a}(\pi)}\right)P_{i}\,q^{(n)}_{a}(\mu^{(n)}_{a}(\pi)))\right)\,\big{|}\,A=a\right],

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newdateformat

monthyeardate\monthname[\THEMONTH], \THEYEAR

Experimenting in Equilibrium

Stefan Wager

Graduate School of Business

Stanford University

[email protected]

Kuang Xu

Graduate School of Business

Stanford University

[email protected]

(Draft Version \monthyeardate)

Abstract

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments $p$ in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize $p$ in large systems using vanishingly small perturbations.

Keywords: experimental design, interference, mean-field model, stochastic system.

1 Introduction

Randomized controlled trials††This work was partially supported by a seed grant from the Stanford Global Climate and Energy Project and a Facebook Faculty Award. are widely used to guide decision making across different domains, ranging from classical industrial and agricultural applications (Fisher, 1935) to developmental economics (Banerjee and Duflo, 2011) and the modern technology sector (Athey and Luca, 2019; Kohavi et al., 2009; Tang et al., 2010). In its most basic form, a randomized trial aims to assess the expected effectiveness of a set of interventions on a population by selecting a small but representative sub-population of units and assigning to each unit a randomly chosen intervention. For example, in a medical trial, the decision maker may want to compare the effectiveness of a new experimental drug with the current standard of care. To do so they select a set of patients, and randomly assign some fraction to the new treatment while others are given the control condition (i.e., current standard of care). The drug is then assessed by comparing the outcomes of treated and control patients. Similar randomized experiments are popular with technology companies, where they are often referred to as A/B tests. In this context, a company would select a small population of its users and expose them to different randomly generated designs; the best design that emerges from the experiment is then deployed to the entire user base at large.

When interpreting the results from randomized trials, it is common to make a “no interference” assumption, whereby we assume that the intervention assigned to any given unit does not affect observed outcomes for other units (Imbens and Rubin, 2015); for example, in our medical example, we might assume that giving the experimental treatment to some patients does not affect outcomes for the control patients who are still receiving standard care. Such a lack of interference plays a key role in enabling us to use randomized trials to understand the effect of large scale policy interventions, as it implies that any effects observed by experimenting on a representative sub-population should also hold when the same interventions are applied to the overall population at large. However, this non-interference assumption is violated in many important applications, and randomized trials can lead to highly misleading conclusions in the presence of cross-unit interference. We illustrate this problem below using an example of Heckman et al. (1998).

Example 1 (Tuition Subsidies).

A policy maker is interested in estimating the effect $\theta(p)$ of offering all high school graduates a fixed subsidy of $$p $to attend college. To do so, they might consider running a small randomized controlled trial: Given a small set of study participants, randomly assign half of them to receive a subsidy$ p$ and half of them not to, and then compare college enrollment rates among those two groups. As argued in Heckman et al. (1998), however, such an approach may badly over-estimate the effect of the subsidy on enrollments because it fails to consider overall equilibrium effects on the college wage premium.

More formally, let $V(a,\,c)$ denote the average net value of enrolling in college, where $a$ denotes the wage premium resulting from a college degree and $c$ the cost of attendance. In general, we should expect $V$ to be monotonically increasing in $a$ and decreasing in $c$ . The subsidy reduces costs by $p$ , and thus at first glance makes college more attractive. Where one needs to be careful, however, is in recognizing that the college wage premium $a$ is not set in stone; rather, it is determined by labor market conditions. If more people enroll in college, one may expect the labor market bargaining power of college graduates to diminish, and for $a$ to decrease in response. Thus, if we believe the subsidy $p$ increases enrollments, we might expect for $a(p)$ to be a (decreasing) function of $p$ due to equilibrium effects.

We are now ready to illustrate why a simple randomized trial falls short here (Figure 1). The randomized trial only affects a small number of study participants and does not capture changes in $a$ ; specifically, it measures $\theta_{rct}(p)=V(a(0),\,c-p)-V(a(0),c)$ . In contrast, the true effect of the subsidy should also reflect its impact on the equilibrium college wage premium, i.e., $\theta(p)=V(a(p),\,c-p)-V(a(0),\,c)$ . In general, for any subsidy $p>0$ , we should expect $a(p)<a(0)$ and so $V(a(0),\,c-p)>V(a(p),\,c-p)$ , meaning that the randomized trial will over-estimate the effect of the subsidy. On the quantitative front, Heckman et al. (1998) discuss a setup where ignoring equilibrium effects would lead to estimates that are off by an order of magnitude.

1.1 Interference and Clustered Inference

The question of how to run experiments in the presence of cross-unit interference has received considerable attention in the literature. The simplest approach to dealing with interference is to assume that we can divide our experimental samples into disjoint clusters that do not interfere with each other, and then to consider inference at the level of these clusters (Baird et al., 2018; Hudgens and Halloran, 2008).

One such example involves experimentation in internet ad auctions, where each auction consists of a keyword along with a set of advertisers who submit competing bids in order for their ads to be displayed when the keyword is queries by a user. There is cross-unit interference because the same advertiser or keyword may appear in multiple auctions. Basse et al. (2016) and Ostrovsky and Schwarz (2011) make the observation that the auction type used for one keyword does not meaningfully affect how advertisers bid for other keywords. They then consider experiments that group auctions into clusters by their keywords and randomize auction formats across these keyword clusters, rather than across advertisers, as a means to avoid problems with interference. More broadly, in the context of tuition subsidies, this idea of cluster-level randomization could correspond to identifying communities that are relatively isolated from each other and randomizing the interventions across communities rather than across individuals; or, in the case of social networking, it could involve deploying different versions of a feature in different countries and hope that the number of cross-border links is small enough to induce only negligible interference.

The limitation of such cluster-based approaches, however, is that the power of any experiment is limited by the number of non-interfering clusters available: For example, if a platform has 200 million customers in 100 countries, but chooses to randomize by country, then the largest effective sample size they can use for any experiment is 100, and not 200 million. Recently, several authors have sought to improve on the power of such cluster-based approaches by considering methods that allow interference to be captured by a generic graph, where two units are connected by an edge if the treatment assigned to one unit may affect the other’s outcome (Aronow and Samii, 2017; Athey et al., 2018; Basse et al., 2019; Eckles et al., 2017; Leung, 2020). Even in this general case, however, we typically need to assume that the interference graph is sparse, i.e., that most units do not interfere with each other. For example, Leung (2020) assumes that the average degree of the interference graph remains bounded.

1.2 Accounting for Interference via Equilibrium Modeling

In this paper, we propose an alternative approach to experimentation in stochastic systems, where a large number of, if not all, units interfere with one another. For concreteness, we focus on the problem of setting supply side payments in a centralized marketplace, where available demand is randomly allocated to a set of available suppliers. In these systems, different suppliers interact via their effects on the overall supply-demand equilibrium: The more suppliers choose to participate in the marketplace, the less demand on average an individual supplier would be able to serve in equilibrium. The objective of the system designer is to identify the optimal payment that maximizes the platform’s utility. Note that conventional randomized experimentation schemes that assume no interference fail in this system: For example, if we double the per-transaction payments made to a random half of suppliers, these suppliers will be more inclined to participate and reduce the amount of demand available to the remaining suppliers, and thus reduce their incentives to participate.

We consider a simple model of such a centralized marketplace, and design a class of “local” experimentation schemes that—by carefully leveraging the structure of the marketplace—enable us to optimize payments without disturbing the overall market equilibrium. To do so, we perturb the per-transaction payment $p_{i}$ available to the $i$ -th supplier by a small mean-zero shock, i.e., $p_{i}=p+\zeta\varepsilon_{i}$ where $0<\zeta\ll 1$ and $\varepsilon_{i}=\pm 1$ independently and uniformly at random. A reduced form linear regression, one that estimates how the individual random shock $\zeta\varepsilon_{i}$ affects supplier- $i$ ’s behavior, recovers a certain marginal response function, which captures the supplier’s sensitivity to payment changes against a fixed ambient market equilibrium. This marginal response, unfortunately, is not directly relevant for policy design, as it does not take into account the shift in market equilibrium should all suppliers receive the same payment change. However, in the limit where the number of suppliers is large, we show that a mean-field model can be used to translate the output of this reduced form regression into an estimate of the gradient of the platform’s utility with respect to $p$ . We can then use these gradient estimates to optimize $p$ via any stochastic first-order optimization method, such as stochastic gradient descent and its extensions.

The driving insight behind our result is that, although there is dependence across the behavior of a large number of units in the system, any such interference can only be channeled through a small number of key statistics: In our example, this corresponds to the total supply made available by all suppliers. Then, if we can intervene on individual units without meaningfully affecting the key statistics, we can obtain useful information about the system—at a cost that scales sub-linearly in the number of units. The type of interference that we consider, where the units experience global interference channeled through a small number of key statistics, can manifest in a range of applications. We discuss some examples below.

Example 2 (Ride Sharing).

Ride sharing platforms match customers who request a ride with nearby freelance drivers who are both active and not currently servicing another request. It is in the interest of the platform to have a reasonable amount of capacity available at all times to ensure a reliable customer experience. To this end, the platform may seek to increase capacity by increasing the rates paid to drivers for completing rides. And, when running experiments on the rates needed to achieve specific capacity levels, the platform needs to account for interference. If the platform in fact succeeds in increasing capacity by increasing rates—yet demand remains fixed—the expected utilization of each driver will go down and so the drivers’ expected revenue, i.e., the product of the rate and the expected utilization, will not increase linearly in the rate. Thus, if drivers respond to expected revenue when choosing whether to work for a platform, as empirical evidence suggests that they do (Hall et al., 2019), a platform that ignores interference effects will overestimate the power of rate hikes to increase capacity. However, as shown in our paper, we can accurately account for these interference effects via mean-field modeling because they are all channeled through a simple statistic, in this case total capacity.

Example 3 (Congestion Pricing).

A policy maker may want to identify the optimal toll for congestion pricing (e.g., Goh, 2002). We assume that drivers get positive utility from completing a trip, but get negative utility both from congestion delays and from paying tolls. Then, in studying the effect of a toll on congestion, the policy maker needs to address the fact that drivers interfere with one another through the overall state of congestion on the road: If we raise the tolls on a small subset of the drivers and hence discourage them from going on the road, those whose tolls remain unchanged may experience less congestion and hence be inclined to drive more. Therefore a policy maker that experiments with a small sub-population, without taking into account interference effects, may obtain an overly optimistic estimate of the true effect of a toll change when applied to all drivers. Again, however, all interference is channeled through a single statistic—congestion—and so mean-field modeling can capture its effect.

Example 4 (Renewable Energy Subsidies).

In an electricity whole sale market, energy producers (e.g., generators) and consumers (e.g., utilities) make bids and offers in the day-ahead market, which is then cleared in a manner that balances the aggregate regional supply and demand. The operator of these markets, such as CAISO or ERCOT, may choose to provide subsidies or scheduling priorities to encourage renewable generation (see CAISO, 2009). Suppose that the market operator would like to know the effect of increasing subsidies on energy generation. We expect that increased subsidies would increase both total and renewable energy production; the question is by how much, and what the effect of interference will be. It is plausible that the effect of subsidies on total supply will be mitigated by interference, because increased production from one supplier will decrease demand available to others. In contrast, interference may either mitigate or amplify the effect of subsidies on renewable energy production: Amplification effects may occur if subsidies affect profitability in a way that causes non-renewable producers to be replaced by new renewable entrants. In either case, all interference effects are channeled through global capacity, and so can be accounted for via mean-field modeling.

1.3 Related Work

The problem of experimental design under interference has received considerable attention in the statistics literature. For example, Blake and Coey (2014) document failures of the non-interference assumption due to an interaction between treated and control customers in an experiment run by an online marketplace. Blundell et al. (2004) consider the effects of a job search program on employment outcomes, and emphasize the importance of considering general equilibrium effects whereby job offers given to program participants may substitute for job offers given to non-participants and increased search activity from participants may lower equilibrium wages for less skilled individuals. Bottou et al. (2013) describe difficulties in using randomized experiments to study internet ad auctions: Advertisers participate in an auction to determine ad placements, and any intervention on one advertiser may change their behavior on the auction and thus affect the opportunities available to other advertisers. In all these cases, simple randomized controlled trials would paint a misleading picture about the effect of an overall policy change.

The dominant paradigm for working under interference has focused on robustness to potential interference effects, and on defining estimands in settings where some units may be exposed to spillovers from treating other units (Aronow and Samii, 2017; Athey et al., 2018; Baird et al., 2018; Basse et al., 2019; Eckles et al., 2017; Hudgens and Halloran, 2008; Leung, 2020; Manski, 2013; Sobel, 2006; Tchetgen Tchetgen and VanderWeele, 2012). Depending on applications, the exposure patterns may be simple (e.g., the units are clustered such that exposure effects are contained within clusters) or more complicated (e.g., the units are connected in a network, and two units far from each other in graph distance are not exposed to each others’ treatments). Unlike this line of work that seeks robustness to interference driven by potentially complex and unknown mechanisms, the local randomization scheme proposed here crucially relies on having a stochastic model that lets us explain interference. Then, because all inference acts via a simple statistic, we can move beyond simply seeking robustness to interference and can in fact accurately predict interference effect using information gathered in equilibrium.

Another plausible approach would be to use structural estimation methods and directly estimate the whole underlying system, and subsequently use stochastic optimization to obtain the optimal decision. However, a full-blown structural estimation approach would be infeasible in our problem because it involves a large number of interacting units each with unknown features. In particular, as will be clear in Section 3, we consider the interaction among a large number of units, and each unit’s behavior depends on a random choice function drawn from a potentially large set of options. The set of problem parameters thus involves the shapes of every possible choice function, as well as sampling distribution with respect to which the function is drawn for each unit. Directly estimating these parameters can be very difficult, and as we show, is not needed if the final goal is to identify the optimal action. Instead, our approach will focus on estimating a small number of key statistics which turn out to be sufficient for performing optimization. Doing so allows us to side-step the scalability problem of the structural estimation approach and arrive at the optimal action in an efficient manner.

The idea that one can distill insights of a structural model down to the relationship between a small number of observable statistics has a long tradition in economics (e.g., Chetty, 2009; Harberger, 1964). This approach can often be used for practical counterfactual analysis without needing to fit complicated structural models. We are inspired by this approach, and here we use such an argument for experimental design rather than to guide methods for observational study analysis. At a high level, our paper also has a connection to results on learning in a setting where agents exhibit strategic behavior, including Feng et al. (2018), Iyer et al. (2014), and Kanoria and Nazerzadeh (2014), and in crowd-sourcing systems, including Johari et al. (2017), Khetan and Oh (2016) and Massoulié and Xu (2018).

Our approach to optimizing $p$ using gradients obtained from local experimentation intersects with the literature on continuous-arm bandits (or noisy zeroth-order optimization), which aims to optimize a function $f(x)$ by sequentially evaluating $f$ at points $x_{1},x_{2},\ldots$ , and obtaining in return noisy versions of the function values $f(x_{1}),f(x_{2}),\ldots$ (Bubeck et al., 2017; Spall, 2005). A number of bandit methods first generate noisy gradient estimates of the function by comparing adjacent function values, and subsequently use these estimates in a first-order optimization method (Flaxman et al., 2005; Ghadimi and Lan, 2013; Jamieson et al., 2012; Kleinberg, 2005; Nesterov and Spokoiny, 2017). In our model, this approach would amount to estimating utility gradients via what we call global experimentation, i.e., by comparing the empirical utilities observed at two different payment levels. Compared to this literature, our paper exploits a cross-sectional structure not present in most existing zeroth-order models: We show that our local experimentation approach, which offers slightly different payments across a large number of units, is far more efficient at estimating the gradient than global experimentation, which offers all units the same payment on a given day. Such cross-sectional signals would be lost if we abstracted away the multiplicity of units, and only treated the average payment as a decision variable to be optimized. In Section 4.4, we provide a formal comparison for the regret of a platform deploying our approach versus a bandit-based algorithm, and establish sharp separation in terms of rates of convergence.

The limiting regime that we use, one in which the system size tends to infinity, is often known as the mean-field limit. It has a long history in the study of large-scale stochastic systems, such as the many-server regime in queueing networks (Bramson et al., 2012; Halfin and Whitt, 1981; Stolyar, 2015; Tsitsiklis and Xu, 2012; Vvedenskaya et al., 1996) and interacting particle systems (Graham and Méléard, 1994; Mézard et al., 1987; Sznitman, 1991). A key property of this mean-field limit is that, while changes to the behavior of a single unit may have significant impact on other units in a finite system, such interference diminishes as the system size grows and, in the limit, the behaviors among any finite set of units become asymptotically independent from one another, a phenomenon known as the propagation of chaos (Bramson et al., 2012; Graham and Méléard, 1994; Sznitman, 1991). This asymptotic independence property underpins the effectiveness of our local experimentation scheme, and ensures that small, symmetric payment perturbations do not drastically alter the equilibrium demand-supply dynamics.

Mean-field-inspired approaches have also been used in game theory to analyze equilibria in the presence of a large number of players by assuming that the agents respond to a certain average behavior of the system (Adlakha et al., 2015; Hopenhayn, 1992; Jovanovic and Rosenthal, 1988; Weintraub et al., 2008); the equilibrium notion we use also falls under this category. In contrast to the existing literature, the main focus of our work lies in using mean-field limits to drive learning and experimentation.

2 Designing Experiments under Equilibrium Effects

For concreteness, we focus our discussion on a simple setting inspired by a centralized marketplace for freelance labor that operates over a number of periods. In each period, the high-level objective of the decision maker (i.e., operator of the platform) is to match demand with a pool of potential suppliers in such a manner that maximizes the platform’s expected utility. To do so, the decision maker offers payments to each potential supplier individually, who in turn decides whether to become active/available based upon their belief of future revenue. Our main question is how the decision maker can use experimentation to efficiently discover their revenue-maximizing payment, despite not knowing the detailed parameterization of the model, and the presence of substantial stochastic uncertainty.

We formally describe a flexible stochastic model in Section 3; here, we briefly outline a simple variant of our model that lets us highlight some key properties of our approach. Each day $t=1,\,...,\,T$ there are $i=1,\,...,\,n$ potential suppliers, and demand for $D_{t}$ identical tasks to be accomplished. A central platform chooses a distribution $\pi_{t}$ , and then offers each supplier random payments $P_{it}\,{\buildrel\text{iid}\over{\sim}\,}\pi_{t}$ they commit to pay for each unit of demand served. The suppliers observe both $\pi_{t}$ a state variable $A_{t}$ that can be used to accurately anticipate demand $D_{t}$ (e.g., $A_{t}$ could capture local weather or events); however, the platform does not have access to $A_{t}$ . Given their knowledge of $P_{it}$ and $A_{t}$ , each supplier independently chooses to become “active”; we write $Z_{it}=1$ for active suppliers and $Z_{it}=0$ else. Then, demand $D_{t}$ is randomly allocated to active suppliers.

Our key assumption is that each supplier chooses to become active based on their expected revenue conditionally on being active, and furthermore that they do so via stationary reasoning (Hopenhayn, 1992). Each supplier first computes $q_{A_{t}}(\pi_{t})$ , their expected allocation rate (rate at which they will be matched with demand) conditionally on being active and given $A_{t}$ and $\pi_{t}$ . They then decide whether to become active by comparing the expected revenue $P_{it}q_{A_{t}}(\pi_{t})$ with a random outside option. We refer to this as a stationary model of supplier choice as we implicitly assume that suppliers don’t take into account the effect of their own decision to become active on their expected allocation rate. This is often taken to be a reasonable assumption in large stochastic systems (Adlakha et al., 2015; Chetty, 2009; Weintraub et al., 2008).

The form of $q(\cdot)$ depends on both the amount of available supply and demand, and the efficiency with which supply can be matched with demand; see Section 3 for an example based on a queuing network. Finally, the platform’s utility $U_{t}$ is given by the revenue from the demand served minus payments made to suppliers. Figure 2 shows a simple example of an equilibrium resulting from this model in the limit as $n$ gets large in a setting where all suppliers are offered the same payment $p$ , for a specific realization of demand $D$ . We see that, as $p$ gets larger, the active supply gets larger than demand and the utilization of active suppliers goes down.

Conversely, our assumption that the platform cannot observe the daily state variable $A_{t}$ is made to ensure that our learning problem is robust and performs well even if the state variables are unavailable or difficult to estimate accurately. In practice, of course, it is plausible that a platform may have access to partial—but not full—information about $A_{t}$ . Here, we focus on the statistically most difficult setting where the platform is oblivious to $A_{t}$ and thus can only learn how to set $p$ via experimentation, as this setting enables us to establish a crisp separation between different approaches to learning and to highlight our core methodological contributions. However, all methods considered here can be adapted to leverage partial information about $A_{t}$ , and further work that investigates how best to leverage such information would be of considerable interest.

Before presenting our proposed approach to learning $p$ below, we first briefly review why standard approaches fall short. The core difficulty in our model comes from the interplay between network effects and market-wide demand fluctuations induced by the $A_{t}$ .

The network effects break what one might call classical A/B-experimentation. Suppose that, on each day $t=1,\,...,\,T$ , the platform chooses a small random fraction of suppliers and offers them an experimental payment $p_{\text{exp}}$ , while everyone else gets offered the status quo payment $p_{\text{default}}$ . We could then try to use the behavior of suppliers offered $p_{exp}$ to estimate expected profit at $p_{\text{exp}}$ , and then update the default payment. This approach allows for cheap experimentation because most of the suppliers get offered $p_{\text{default}}$ . However, it will not consistently recover the optimal payment because it ignores feedback effects: When we raise payments, more suppliers opt to join the market and so the rate at which any given supplier is matched with demand goes down—and this attenuates the payment-sensitivity of supply relative to what is predicted by A/B testing.

Conversely, the market-wide demand fluctuations due to $A_{t}$ degrade global optimization schemes that use payment variation across days for learning; such algorithms are equivalent to continuous-armed bandit algorithms considered in the optimization literature (Spall, 2005). Suppose that, on each day $t=1,\,...,\,T$ , we randomly chose a payment $p_{t}$ and made it available to all suppliers, and then observed realized profits $U_{t}$ . We could then try estimate profit gradients by comparing $U_{t}$ to $U_{t-1}$ . The problem is that, due to variation in daily context, the variation in per-supplier profit $U_{t}/n$ given the chosen payment $p_{t}$ is always of constant order, even in very large markets (i.e., in the limit $n\rightarrow\infty$ ); for example, in a ride-sharing setting, if day $t-1$ is rainy and day $t$ is sunny, then the effect of this weather change on profit may overwhelm the effect of any payment change deployed by the platform.111Of course, the platform may try to correct for contexts, e.g., by matching days with similar values of $A_{t}$ with each other. One currently popular way of doing so in the technology industry is using synthetic controls (Abadie et al., 2010). In practice, however, this approach may be difficult to implement, and will remain intractably noisy unless the platform can observe the full context $A_{t}$ and use it to essentially perfectly predict demand. As discussed above, our goal in this paper is to develop methods for learning that are driven purely by experimentation, and that do not rely on the platform being able to accurately observe $A_{t}$ . The upshot is that the platform cannot learn anything via global experimentation unless it considers large changes to the payments $p_{t}$ that it offers to everyone. And such wide-spread payment changes are impractical for several reasons: They are expensive, and difficult to deploy.

2.1 Local Experimentation

Our goal is to use high-level information about the stochastic system described above to design a new experimental framework that lets us avoid the problems of both approaches described above: We want our experimental scheme to be consistent for the optimal payment (like global experimentation), but also to be cost-effective (like classical A/B testing) in that it only requires small perturbations to the status quo.

The driving insight behind our approach is that it is possible to learn about the relationship between profit and payment via unobstrusive randomization by randomly perturbing the payments $P_{it}$ offered to supplier $i$ in time period $t$ . We propose setting

[TABLE]

uniformly at random, where $\zeta>0$ is a (small) constant that governs the magnitude of the perturbations, and regressing market participation $Z_{it}$ on the payment perturbations $\varepsilon_{it}$ . This regression lets us recover the marginal response function, i.e., the average payment sensitivity of a supplier in a situation where only they get different payments but others do not; see Section 3.2 for a formal definition.

This marginal response function is not directly of interest for optimizing $p$ , as it ignores feedback effects. However, we find that—in our setting—this quantity captures relevant information for optimizing payments. More specifically we show in Section 3.2 that, provided we have good enough understanding of system dynamics to be able to anticipate match rates given the amount of supply and demand present in the market, in the mean-field limit where the market size grows, we can use consistent estimates of the marginal response function to derive consistent estimates of the actual payment-sensitivity of supply that accounts for network effects. Furthermore, we show in Section 4 that this approach enables us to optimize payments using vanishingly small-scale experimentation as the market gets large (i.e., we can take $\zeta$ in (2.1) to be very small when $n$ is large).

Figure 3 shows results from our local experimentation approach on a simple simulation experiment in the setting of Figure 2, where the scaled demand $\mathbb{E}\left[D/n\,\big{|}\,A\right]$ follows a $\text{beta}(15,\,35)$ distribution. We initialize the system at $p_{1}=30$ , and then each day run payment perturbations as in (2.1) to guide a payment update using an update rule described in Section 4.2. We see that the system quickly converges to a near-optimal payment of around 17.

We also compare our results to what one could obtain using the baseline of global experimentation, where we randomize the payment $p_{t}\sim\text{Uniform}(10,\,30)$ in each time period and measure resulting platform utility $U_{t}$ , and then choose the final payment $\hat{p}$ by maximizing a smooth estimate of the expectation of $U_{t}$ given $p_{t}$ . The left panel of Figure 4 shows the resulting $(p_{t},\,U_{t})$ pairs, as well as the resulting $\hat{p}$ . As seen in the right panel of Figure 3, the final $\hat{p}$ obtained via this method is a reasonable estimate of the optimal $p$ .

The major difference between the local and global randomization schemes is in the resulting cost of experimentation. In Section 4.3 we show that our local experimentation scheme pays a vanishing cost for randomization; the only regret relative to deploying the optimal $p$ from the start is due to the rate of convergence of gradient descent. In contrast,the cost of experimentation incurred for finding $\hat{p}$ via global experimentation is huge, because it needs to sometimes deploy very poor choices of $p_{t}$ in order to learn anything. And, as shown in the right panel of Figure 4, after the first few days, the global experimentation approach in fact systematically achieves lower daily utilities $U_{t}$ than local experimentation. In Section 6 we consider further numerical comparisons of local and global experimentation, as well as variants of global exploration that balance exploration and exploitation to improve in-sample regret.

*Remark 1** (Relationship to Batched Bandits).*

Our model bears resemblance to batched multi-arm bandits (Esfandiari et al., 2020; Gao et al., 2019; Perchet et al., 2016) and batched online optimization (Bubeck et al., 2019; Duchi et al., 2018), where an analyst sequentially picks multiple arms to pull for one batch at a time. In particular, administering an intervention to a unit in our model could be seen as analogous to pulling one arm in batched bandits, or sampling an unknown function at a particular point in batched online optimization. There is, however, a fundamental distinction between our model and the predominant model for batched bandits. Existing work on batched bandits does not allow for interference within batches: The action assigned to one unit in a batch does not directly affect the outcome observed for another unit in the batch. In contrast, the presence of cross-unit interference within batches (or, for us, within days) is at the heart of our model: The outcome of a unit not only depends on their own intervention, but also on the interventions experienced by other units on the same day. Thus, existing results on batched bandits and online optimization cannot be used to reason about how best to deploy heterogeneous incentives to different suppliers in order to converge to a good choice of $p$ in our setting.

3 Model: Stochastic Market with Centralized Pricing

We now present the general stochastic model we use to motivate our approach. All random variables are assumed to be independent across the periods and, within each period, are independent from one another unless otherwise stated. We will consider a sequence of systems, indexed by $n\in\mathbb{N}$ , where in the $n$ -th system there are $n$ potential suppliers. We will refer to $n$ as the market size. All variables in our model are thus implicitly dependent on the index, $n$ , which we denote using the superscript $(n)$ , e.g., $q^{(n)}$ . We sometimes suppress this notation when the context is clear. In the rest of the section, we will focus on describing the model in a single time period.

Demand

To reflect the reality that demand fluctuations may not concentrate with $n$ , we allow for a random stochastic global state $A$ drawn from a finite set $\mathcal{A}$ . The global state affects demand, and is known to market participants (suppliers), but not to the platform (or the platform cannot react to it). For example, in a ride sharing example, $A$ could capture the effect of weather (rain / shine) or major events (conference, sports game, etc.). Conditionally on the global state $A=a$ , we assume that demand, $D$ , is drawn from distribution $D\sim F_{a}$ . We further assume that the demand scales proportionally with respect to the market size $n$ , and that it concentrates after re-scaling by $1/n$ . In particular, we assume that there exists $\{d_{a}\}_{a\in\mathcal{A}}\subset\mathbb{R}_{+}$ , such that for all $a\in\mathcal{A}$ , $\mathbb{E}\left[D/n|A=a\right]=d_{a}$ for all $n\in\mathbb{N}$ ,

[TABLE]

and

[TABLE]

and as $n\to\infty$ . In general, we will use the sub-script $a$ to denote the conditioning that the global state $A=a$ .

Matching Demand with Suppliers

Depending on the realization of demand, all or a subset of the suppliers will be selected to serve the demand. In particular, the matching between the potential suppliers and demand occurs in three rounds:

Round 1: The platform chooses a payment distribution, $\pi$ , and draws payments $P_{i}\,{\buildrel\text{iid}\over{\sim}\,}\pi$ for $i=1,2,\ldots,n$ . Then, for each supplier $i$ , the platform announces both the payment $P_{i}$ and the underlying distribution $\pi$ , with the understanding that the supplier will be compensated with $P_{i}$ for every unit of demand that they will be matched with eventually.
Round 2: Suppliers choose whether to they want to be active. A supplier will not be matched with any demand if they choose to be inactive. We write $Z_{i}\in\left\{0,\,1\right\}$ to denote whether the $i$ -th participant chooses to participate in the marketplace, and write $T=\sum_{i=1}^{n}Z_{i}$ as the total number of active suppliers. The mechanism through which a supplier determines whether or not to become active will be described shortly.
Round 3: The platform employs some mechanism that randomly matches demand with active suppliers.

Denote by $S_{i}$ the amount of demand that an active supplier $i$ will be able to serve, and define

[TABLE]

as the expected demand allocation to an active supplier under the payment distribution $\pi$ , conditional on the total demand being $d$ and total active suppliers being $t$ . We allow for a range of possible matching mechanisms, but assume that in the limiting regime where $t$ and $d$ are large, $\Omega(d,t)$ converges to a “regular allocation function” that only depends on the ratio between the demand and active suppliers, $d/t$ .

Definition 5 (Regular Allocation Function).

A function $\omega:\mathbb{R}_{+}\to[0,1]$ is a regular allocation function if it satisfies the following:

$\omega(\cdot)$ is smooth, concave and non-decreasing. 2. 2.

$\lim_{x\to 0}\omega(x)=0$ and $\lim_{x\to\infty}\omega(x)\leq 1$ . 3. 3.

$\lim_{x\to 0}\omega^{\prime}(x)\leq 1$ .

The condition of $\omega$ being concave corresponds to the assumption that the marginal difficulty with which additional demand can be matched does not decrease as demand increases. The condition that $\lim_{x\to\infty}\omega(x)\leq 1$ asserts that the maximum capacity of all active suppliers be bounded after normalization.

Assumption 1.

The function $\Omega:\mathbb{R}_{+}^{2}\to\mathbb{R}_{+}$ satisfies the following:

$\Omega(d,t)$ is non-decreasing in $d$ , and non-increasing in $t$ . 2. 2.

There exists a bounded error function $l:\mathbb{R}_{+}^{2}\to\mathbb{R}_{+}$ with

[TABLE]

such that $\Omega(d,t)=\omega(d/t)+l(d,t)$ for all $t,d\in\mathbb{R}_{+}$ , where $\omega(\cdot)$ is a regular allocation function.

We provide below an example system in which the allocation rates are given by a regular allocation function (Definition 5).

Example 6 (Regular Allocation Function Example: Parallel Finite-Capacity Queues).

Consider a service system where each active supplier operates as a single-server $M/M/1$ queue with a finite capacity, $L\in\mathbb{N}$ , $L\geq 2$ . A request that arrives at a queue is accepted if and only if the queue length is less than or equal to $L$ , and is otherwise dropped. We assume that all servers operate at unit-rate, so that a request’s service time is an independent exponential random variable with mean 1. Each unit demand generates an independent stream of requests which is modeled by a unit-rate Poisson process, so that the aggregate arrival process of requests is Poisson with rate $D$ (by the merging property of independent Poisson processes). When a new request is generated within the system, the platform routes it to one of the $T$ queues selected uniformly at random. The random routing corresponds, for instance, to a scenario where both the incoming requests and active suppliers are scattered across a geographical area, and as such, requests are assigned to the nearest server.

Within this model, each active supplier effectively functions an $M/M/1$ queue with service rate $1$ and arrival rate $D/T$ . Due to the capacity limit at $L$ , some requests may be dropped if they are assigned to a queue currently at capacity. Using the theory of $M/M/1$ queues, it is not difficult to show that (e.g., Spencer et al., 2014, eq. 5.6) if we denote $D/T$ by $x$ , then the rate at which requests are processed by a server, corresponding to the allocation rate, is given by

[TABLE]

Numerical examples of $\omega(\cdot)$ are given in Figure 5. Note that $\omega(\cdot)$ satisfies all conditions in Definition 5 and is hence a regular allocation function. Finally, we may generalize the model to where the suppliers are partitioned into $k$ equal-sized groups, so that each sever operates at speed $Tkm$ . The corresponding allocation function would have the same qualitative behavior.

Supplier Choice Behavior

We assume that each supplier takes into account their expected revenue in equilibrium when making the decision of whether or not to become active. In particular, the of supplier $i$ becoming active is given as follows, where $T$ is the equilibrium number of active suppliers:

[TABLE]

Here, $\mathbb{E}_{\pi}\left[\Omega(D,T)\,|\,A=a\right]$ is the expected amount of demand served by each supplier given the platform’s choice of $\pi$ , and thus $P_{i}\,\mathbb{E}_{\pi}\left[\Omega(D,T)\,|\,A=a\right]$ is the expected revenue of the $i$ -th supplier in equilibrium.222For now, assume that such equilibrium distribution is well defined, and we will justify its meaning rigorously in a moment. Note that the choice model (3.6) is stationary in that each supplier only considers the average behavior of other marketplace participants when choosing whether or not to enter. In particular, suppliers do not consider the effect of their own entry decision on the system, or combinatorial interactions between other marketplace participants. Similar types of stationary assumptions, also known as mean-field or oblivious equilibrium, are common in game theoretic models involving a large number of players where each player’s influence on the overall system dynamic is vanishingly small (Hopenhayn, 1992; Weintraub et al., 2008), and can be formally justified by showing how stationary equilibrium converges to the true Nash equilibrium in the limit as the system size tends to infinity (Adlakha et al., 2015).

Here, $B_{i}$ is a private feature that captures the heterogeneity across potential suppliers, such as a supplier’s cost, or noise in their estimate of the expected revenue. We assume that the $B_{i}$ ’s are drawn i.i.d. from a set $\mathcal{B}$ whose distribution may depend on $A$ . The choice function $f_{b}(x)$ represents the of the supplier becoming active, when their private feature is $b$ and expected equilibrium revenue is $x$ . We assume the family of choice functions $\{f_{b}(\cdot)\}_{b\in\mathcal{B}}$ satisfy certain regularity properties detailed below.

Assumption 2.

We assume that supplier choices are determined by the stationary choice model (3.6). Furthermore, for all $b\in\mathcal{B}$ , we assume that the choice function $f_{b}(\cdot)$ takes values in $[0,1]$ , is monotonically non-decreasing, and twice differentiable with a uniformly bounded second derivative.

Below is one example of a family of choice functions that satisfies Assumption 2:

Example 7 (Logistic Choice Function).

A popular model in choice theory is the logit model (cf. Chapter 3 of Train (2009)), which, in our context, corresponds to the choice function being the logistic function:

[TABLE]

where $\alpha>0$ is a parameter and the private feature $B_{i}$ takes values in $\mathbb{R}_{+}$ and represent the break-even cost threshold of supplier $i$ . In this example, the supplier’s decision on whether to activate will depend on whether their expected revenue exceeds their break-even cost. The sensitivity of such dependence is modeled by the parameter $\alpha$ . Note that in the limit as $\alpha\to\infty$ , the probability of the event $Z_{i}=1$ conditionally on $P_{i}$ , $\pi$ and $A$ is either 0 or 1. That is, a supplier will choose to be active if and only if they believe their expected revenue from Round 2 will exceed the break-even threshold $B_{i}$ .

Platform Utility and Objective

The platform’s utility is defined to be the difference between revenue and total payment:

[TABLE]

where $S_{i}$ is the amount of demand that a supplier would serve if they become active, and $R(D,T)$ is the platform’s expected revenue, with equilibrium active supply size $T$ and total demand $D$ . Analogously to the case of $\Omega(D,T)$ , we will assume that the revenue function $R$ is approximately linear in the sense that, for some function $r$ , $R(D,\,T)\approx r(D/T)T$ when $T$ and $D$ are large. More precisely, assume the following:

Assumption 3.

There exists a bounded error function $l:\mathbb{R}_{+}^{2}\to\mathbb{R}_{+}$ with $\left\lvert l(d,t)\right\rvert=\smash{o(1/\sqrt{t}+1/\sqrt{d})}$ such that

[TABLE]

where $r:\mathbb{R}_{+}\to\mathbb{R}_{+}$ is a smooth function with bounded derivatives.

As an example, the platform could receive a fixed amount $\gamma$ from each unit of demand served, in which case we have $R(D,T)=\gamma(T\Omega(D,T))$ . Given this notation, we write the platform’s expected utility in the $n$ -th system as

[TABLE]

Denote by $\delta_{x}$ the Dirac measure with unit mass on $x$ . We consider two different objectives for the decision maker (i.e., platform operator). First, they may want to control regret, and deploy a sequence of payment distributions $\pi$ whose utility nearly matches that of the optimal fixed payment, $p^{*}$ . Second, they may want to estimate $p^{*}$ . In Section 4, we provide results with guarantees along both objectives.

Symmetric Payment Perturbation

An important family of payment distributions that will be used repeatedly throughout the paper is that of symmetric payment perturbation. Let $\{\varepsilon_{i}\}_{i\in\mathbb{N}}$ be a sequence of i.i.d. Bernoulli random variables with $\mathbb{P}(\varepsilon_{i}=-1)=\mathbb{P}(\varepsilon_{i}=+1)=\frac{1}{2}.$ Fix $p>\zeta>0$ . We say the payments are $\zeta$ -perturbed from $p$ , if

[TABLE]

In what follows, we will use $\pi_{p,\zeta}$ to denote the payment distribution when payments are $\zeta$ -perturbed from $p$ , $\mu^{(n)}_{a}(p,\zeta)$ to denote $\mu^{(n)}_{a}(\pi_{p,\zeta})$ . The meanings of $\mu^{(n)}_{a}(p,\zeta)$ , $u^{(n)}_{a}(p,\zeta)$ , etc., are to be understood analogously. When $\zeta=0$ , we may omit the dependence on $\zeta$ and write, for instance, $\mu^{(n)}_{a}(p)$ in place of $\mu^{(n)}_{a}(p,0)$ or $\mu^{(n)}_{a}(\pi_{p,0})$ .

*Remark 2** (What does the platform know?).*

Our model assumes that the platform has detailed knowledge of the allocation mechanics, but cannot anticipate the behaviors of market participants that drive of supply and demand. More specifically, we assume that the platform knows the regular allocation function $\omega$ (Definition 5), and its pre-limit version, $\Omega$ (3.3); the limiting platform utility function $r$ (Assumption 3), and its pre-limt version, $R$ (3.8); as well as the payment scheme it chooses to use, i.e., $p$ , $\zeta$ , and the realizations of the random perturbations, $\{\varepsilon_{i}\}_{i=1,\ldots,n}$ . However, the platform cannot anticipate the global state $A$ , the demand $D$ , or the distribution of supplier choice functions $f_{B_{i}}(\cdot)$ ; rather, all it can do is collect after-the-fact measurements of $D$ and $\{Z_{i}\}_{i=1,\ldots,n}$ , the set of active suppliers. Finally, we implicitly assume that the the global state $A_{t}$ has no effect on the system beyond their assigned time period, and that the platform knows this fact.

This modeling choice reflects an understanding that it is realistic for a platform to have a good handle on the mechanics of the marketplace it controls, but it is implausible for it to have an in-depth understanding of the beliefs and preferences of all marketplace participants. For example, in the case of ride sharing, it is plausible that a platform could get good at modeling congestion, but less plausible that the platform could fully understand and anticipate how all its drivers may respond to various policy changes.

The fact that we take the platform to be completely oblivious to the global state $A_{t}$ puts us in an extreme setting, where the platform’s learning must be purely driven by randomization in $p$ . We chose this extreme setting primarily for two reasons. First, it crystallizes the difficulty of the learning problem, and highlights the value of local experimentation relative to global experimentation baselines. Second, a platform’s knowledge of $A_{t}$ , if any, is likely to be noisy and inaccurate, and it is often difficult for a platform to learn efficiently in practice by matching historical data using noisy estimates of their corresponding contexts. Therefore, it is of considerable practical value to devise a robust learning algorithm that works well without relying on the platform’s ability to infer the global state $A_{t}$ .

In practice, of course, the platform may have some information about the global state $A$ ; for example, we may assume that the platform observes a set of covariates $X$ that capture some aspects of $A$ (e.g., we could have $X=\Xi(A)$ for some lossy function $\Xi$ ). In such a setting, the information $X$ could be used for variance reduction and/or learning better policies that exploit heterogeneity explained by $X$ . It would be of considerable interest to study a covariate-enriched variant of our approach that allows the platform to use such information to learn better policies; however, we leave this line of investigation to follow-up work.

3.1 Mean-Field Asymptotics

The stochastic model described above in general admits complex dynamics that are not amenable to exact analysis. Fortunately, we show in this sub-section that in the mean-field limit where the number of suppliers is large, various key equilibrium quantities converge to tractable objects described by a mean-field model. To start, we first provide a formal definition of the equilibrium active supply size, $T$ , and verify existence and uniqueness.

Definition 8 (Active Supply Size in Equilibrium).

We say that a random variable $T$ is an equilibrium supply size, if, when all suppliers make activation choices according to (3.6), the resulting distribution for the number of active suppliers equals that of $T$ .

Lemma 1.

Suppose that the conditions in Assumptions 1, 2 and 3 hold. Fix $p>0$ , $\zeta\in[0,p)$ , and $a\in\mathcal{A}$ . Let the payment distribution $\pi$ be defined on $\mathbb{R}_{+}$ . Then, conditional on $A=a$ , the equilibrium active supply size exists, is unique, and follows a Binomial distribution.

Next, we define some quantities that will play a key role in our analysis, and verify that they converge to tractable mean-field limits. The first quantity we consider is the equilibrium number of active suppliers $\mu_{a}^{(n)}(p)$ , as defined in (3.6). Second, we define the function $q(\cdot)$ , which captures the expected amount of demand matched to each supplier if the total number of suppliers were exogenously drawn as a binomial $(n,\,\mu)$ random variable rather than determined by the equilibrium:

[TABLE]

Lemma 2.

Under the conditions of Lemma 1, for all $a\in\mathcal{A}$ , and $p,\mu\in\mathbb{R}_{+}$ , the following hold:

[TABLE]

where $\omega(\cdot)$ and $r(\cdot)$ are described in Definition 5 and Assumption 3, respectively. In (3.13), the limit $\mu_{a}(p)$ is the only solution to $\mu=\mathbb{E}\left[f_{B_{1}}\left(p\omega(d_{a}/\mu)\right)\,\big{|}\,A=a\right]$ .

Finally, the following result, proven in Appendix B.1, establishes conditions under which the limiting utility functions $u_{a}(p)$ are concave, thus enabling us to globally optimize utility via first-order methods.

Lemma 3.

Let $f_{a}(\cdot)$ be the average choice function: $f_{a}(x)=\mathbb{E}\left[f_{B_{1}}(x)\,\big{|}\,A=a\right].$ Fix $\gamma>0$ , $c_{0}\in(0,\gamma)$ and $a\in\mathcal{A}$ . Suppose the following holds:

We have a linear revenue function, $r(x)=\gamma\omega(x)$ . 2. 2.

Let $\underline{x}=\inf_{p\in(c_{0},\gamma)}pq_{a}(\mu_{a}(p))$ and $\overline{x}=\sup_{p\in(c_{0},\gamma)}pq_{a}(\mu_{a}(p))$ . The average choice function $f_{a}(\cdot)$ satisfies

(a)

$f_{a}(\cdot)$ * is strongly concave in the domain $(\underline{x},\overline{x})$ .* 2. (b)

$f_{a}(\underline{x})-f^{\prime}_{a}(\underline{x})\underline{x}\geq 0$ , or, equivalently, that there exists a differentiable, non-negative concave function $\tilde{f}(\cdot)$ , such that $\tilde{f}(\underline{x})=f_{a}(\underline{x})$ and $\tilde{f}^{\prime}(\underline{x})\leq f^{\prime}_{a}(\underline{x})$ . 3. 3.

The allocation function $\omega(\cdot)$ is strongly concave in the domain $\left(d_{a}/\mu_{a}(c_{0}),d_{a}/\mu_{a}(\gamma)\right)$ .

Then, under the conditions of Lemma 1, the limiting platform utility $u_{a}(\cdot)$ is strongly concave in the domain $(c_{0},\gamma)$ .

3.2 The Marginal Response Function

Finally, as discussed in Section 2, a key quantity that motivates our approach to experimentation is the marginal response function, $\Delta(p)$ , which captures the average payment sensitivity of a supplier in a situation where only they get different payments but others do not (meaning that there are no network effects).

Definition 9 (Marginal Response Function).

Fix $n\in\mathbb{N}$ , $a\in\mathcal{A}$ and $p>0$ . The marginal response function is defined by

[TABLE]

This marginal response function $\Delta$ plays a key role in our analysis for the following reasons. First, as shown in the following section, in the mean-field limit as $n\to\infty$ , $\Delta$ is easy to estimate using small random payment perturbations that do not meaningfully affect the overall equilibrium. Second, provided we have a good enough understanding of the underlying system dynamics to know the appropriate allocation function $\omega(\cdot)$ , we can use consistent estimates of $\Delta$ to estimate the true payment sensitivity of supply that accounts for feedback effects, $d\mu(p)/dp$ . This fact is formalized in the following result. We note that, other than $\Delta$ , all terms on the right-hand side of (3.20) are readily estimated from observed data by taking averages.

Lemma 4.

Under the conditions of Lemma 1, for any $a\in\mathcal{A}$ and $p\in\mathbb{R}_{+}$ , we have that

[TABLE]

Furthermore, this relationship carries through in the mean-field limit,

[TABLE]

In addition to powering our approach to experimentation, the result of Lemma 4 also provides qualitative insights about the drivers of interference in our model. If there were no interference among the suppliers, then the gradient $(d/dp)\mu_{a}(p)$ would have coincided with the marginal response $\Delta_{a}(p)$ ; but due to interference, the gradient is attenuated by an interference factor $1+R_{a}(p)$ , where

[TABLE]

We thus observe the following:

•

The interference factor is negligible when the “scaled marginal sensitivity” $\Sigma_{a}^{\Delta}(p)$ is small, i.e., the marginal response function is small relative to the current supply $\mu_{a}(p)$ . Note that $p\Delta_{a}(p)$ is a scale-free version our marginal response function that is invariant to rescaling $p$ .

•

The interference factor is negligible when the “scaled matching elasticity” $\Sigma_{a}^{\Omega}(p)$ is small, i.e., the elasticity of the matching function $\omega(\cdot)$ is small relative to the current ratio of supply to demand $\mu_{a}(p)/d_{a}$ . In particular, because $\omega(\cdot)$ is concave and bounded by assumption, we can verify that $\Sigma_{a}^{\Omega}(p)$ is small whenever demand far exceeds supply, i.e. $d_{a}/\mu_{a}(p)\gg 1$ ; see Proposition 5 stated below and proven in Appendix B.2.

•

The interference factor is non-negligible when neither of the above conditions hold.

These observations are aligned with what one might have anticipated based on qualitative arguments. For example, interference effects clearly cannot matter if marketplace participants are overall unresponsive to changes in $p$ , and this is exactly what we found in the first bullet point. Meanwhile, one might have expected for the effect of interference to be more pronounced when there is more intense competition among the suppliers than when there is enough demand to keep all suppliers busy, and this conjecture is well in line with our finding in the second bullet point.

Proposition 5.

Let $g:\mathbb{R}_{+}\to\mathbb{R}_{+}$ be concave with piece-wise continuous derivative $g^{\prime}$ . Then $xg^{\prime}(x)\leq g(x)$ for all $x>0$ . If moreover $0<\lim_{x\rightarrow\infty}g(x)<\infty$ , then $\lim_{x\rightarrow\infty}xg^{\prime}(x)/g(x)=0$ .

4 Learning via Local Experimentation

We present our main results in this section. The main framework we adopt for learning payments is based on first-order optimization. First, we show in Section 4.1 that our local experimentation approach enables us to construct an asymptotically accurate estimate of the utility gradient at a given payment $p$ , in the mean-field limit as $n\rightarrow\infty$ . Then, we use these gradient estimates to update the payment using a form of gradient ascent, and show that their performance is superior to what can be achieved via classical continuous-armed bandit and zeroth-order optimization algorithms. Specifically, we establish in Section 4.2 an $O(1/T)$ upper bound for the rate of convergence to the optimal platform utility under our algorithm. In Section 4.3, we study the cost of the local experimentation needed to estimate utility gradients, and verify that it scales sub-linearly in $n$ . Finally in Section 4.4, we compare our results to those available to classical continuous-armed bandits, and show that it is not possible to achieve the $O(1/T)$ convergence rate within the classical bandit framework. Throughout this section, we focus on optimizing utility in the mean-field limit, while verifying that finite- $n$ errors have an asymptotically vanishing effect on learning.

4.1 Estimating Utility Gradients

Recall that, in our model, there are two sources of randomness. First, there is the stochastic global context $A\in\mathcal{A}$ , which affects overall demand. In the context of ride-sharing, $A$ could capture multiplicative demand fluctuations due to weather or holidays. Second, there is randomness due to decisions of individual market participants. This second source of error decays with market size size $n$ . Our goal here is to verify that local experimentation allows us to eliminate errors of the second type via concentration as the market size $n$ gets large. Conversely, because the context $A$ affects everyone in the same way, there is no way to average out the effect of $A$ without collecting data across many days.

Define $\widebar{Z}=\frac{1}{n}\sum_{i=1}^{n}Z_{i}$ and $\widebar{D}=D/n$ . As discussed in Section 2 our proposal starts for perturbing individual payments as in (3.11), and then estimating the regression coefficient $\widehat{\Delta}$ of market participation $Z_{i}$ on the perturbation $\zeta_{n}\varepsilon_{i}$ , i.e.,

[TABLE]

Our first result below relates this quantity $\widehat{\Delta}$ we can estimate via local randomization to a quantity that is more directly relevant to estimating payments, namely the payments derivative of $u$ conditionally on the global state $A$ .

Theorem 6.

Suppose the conditions of Lemma 1 hold. Let

[TABLE]

and

[TABLE]

Then, assuming that the perturbations scale as $\zeta_{n}=\zeta n^{-\alpha}$ for some $0<\alpha<0.5$ ,

[TABLE]

for any $\varepsilon>0$ .

*Remark 3** (Population-wide Experimentation & Symmetric Perturbation).*

It is instructive to note that our experimentation scheme in (3.11) has two distinguishing features that depart from a conventional approach to A/B testing that would gave a small subset of suppliers an $\varepsilon$ increase in payment while keeping payments in the rest of the population unchanged. First, our perturbation is symmetric across the units (zero-mean perturbation), whereas in classical settings those in a treatment group may receive asymmetric, and possibly identical, treatments. Second, we experiment across the entire population as opposed to a small sub-population.

These features are in fact deliberate and interdependent, and the rationales are as follows. The perturbations being symmetric ensures that our experimentation scheme does not meaningfully shift the overall supply-demand equilibrium ( $\mu^{(n)}_{a}(p)$ ), which in turn allows us to circumvent the impact of cross-unit interference. Moreover, as we show in Section 4.3, the symmetric perturbations lead to a small cost of experimentation: Roughly speaking, the effect of paying half of the population $\varepsilon$ more is roughly neutralized by simultaneously paying the other half $\varepsilon$ less. Meanwhile, the fact that we experiment on the whole population enables us to attain reasonable power using small enough perturbations $\varepsilon$ such as not to be biased by the curvature of the supplier-specific choice functions $f_{b}(\cdot)$ .

If perturbing the price carries a large fixed cost and the analyst wishes to apply local experimentation only to a small set of the population, one could also consider letting the random shocks $\epsilon_{i}$ be equal to [math] with non-zero probability, and restrict the rest of the analysis on the sub-population who has received a non-zero shock. This would also amount to a valid local experimentation scheme. However, we note that the power of our approach to estimate the marginal response function depends on $\operatorname{Var}\left[\epsilon_{i}\right]$ and so, in order the maintain a given level of power (i.e., to ensure that $\operatorname{Var}\left[\epsilon_{i}\right]=1$ ), the platform would need to use larger shocks $\varepsilon_{i}$ for those suppliers that receive non-zero shocks. This in turn may expose the analyst to larger approximation errors from linearly approximating a curved function.

4.2 A First-Order Algorithm

Our key use of Theorem 6 involves optimizing for a utility-maximizing $p$ . At every time period $t$ , $\widehat{\Gamma}_{t}$ is a consistent estimate of the gradient of $u_{A_{t}}(\cdot)$ at $p_{t-1}$ , and we can plug it into any first-order optimization method that allows for noisy gradients. The proposal below is a variant of mirror descent that allows us to constraint the $p_{t}$ to an interval $I$ (e.g., Beck and Teboulle, 2003). We need to specify a step size $\eta$ , an interval $I=[c_{-},\,c_{+}]$ , and an initial payment $p_{1}$ . Then, at time period $t=1,\,2,\,...$ , we do the following:

Deploy randomized payment perturbations (3.11) around $p_{t}$ to estimate $\widehat{\Gamma}_{t}$ as in (4.3), 2. 2.

Perform a gradient update333Note that, without the constraint to the interval $I$ , this update is equivalent to basic gradient descent with $p_{t+1}=p_{t}+2\eta\widehat{\Gamma}_{t}/(t+1)$ .

[TABLE]

The following result shows that if we run our method for $T$ time periods in a large marketplace and the reward functions $u_{a}(\cdot)$ are strongly concave, then the utility derived by our first-order optimization scheme is competitive with any fixed payment level $p$ , up to regret that decays as $1/t$ .444In (4.6), we up-weight the regret terms $u_{A_{t}}(p)-u_{A_{t}}(p_{t})$ in later time periods to emphasize their $1/t$ rate of decay. One could also use an analogous proof to verify that the unweighted average regret is bounded on the order of $T^{-1}\sum_{t=1}^{T}\left(u_{A_{t}}(p)-u_{A_{t}}(p_{t})\right)=\mathcal{O}_{P}(\log(T))$ .

Theorem 7.

Under the conditions of Theorem 6, suppose we run the above learning algorithm for $T$ time periods and that $u_{a}(\cdot)$ is $\sigma$ -strongly concave over the interval $p\in I$ for all $a$ . Suppose, moreover, that we run (4.5) with step size $\eta>\sigma^{-1}$ and that the gradients of $u$ are bounded, i.e., $\left\lvert u^{\prime}_{a}(p)\right\rvert<M$ for all $p\in I$ and $a\in\mathcal{A}$ . Then

[TABLE]

for any $p\in I$ and $T\geq 1$ .

The above result doesn’t make any distributional assumptions on the contexts $A_{t}$ ; rather, (4.6) bounds the regret of our payment sequence $p_{1},\,p_{2},\,...$ along the realized sample path of $A_{t}$ relative to any fixed oracle. We believe this aspect of our result to be valuable in many situations: For example, if $A_{t}$ needs to capture weather phenomena that have a big effect on demand, it is helpful not to need to model the distribution of $A_{t}$ , as the weather may have complex dependence in time as well as long-term patterns. However, if we are willing to assume that the $A_{t}$ are independent and identically distributed, Theorem 7 also implies that an appropriate average of our learned payments is consistent for the optimal payment via online-to-batch conversion (Cesa-Bianchi et al., 2004).

Corollary 8.

Under the conditions of Theorem 7, suppose moreover that the $A_{t}$ are independent and identically distributed and let $u(p)=\mathbb{E}\left[u_{A_{t}}(p)\right]$ . Then, for any $\delta>0$ ,

[TABLE]

where $p^{*}=\operatorname{argmax}\left\{u(p):p\in I\right\}$ and $\bar{p}_{T}=\frac{2}{T(T+1)}\sum_{t=1}^{T}t\,p_{t}$ .

4.3 The Cost of Experimentation

Our argument so far has proceeded in two parts. In Section 4.1 we showed we could consistently use local experimentation to estimate gradients of the utility function $u_{a}(p)$ . Then, in Section 4.2, we gave bounds on the regret that updates payments $p_{t}$ via gradient descent—as though the platform could observe gradients $u_{a}^{\prime}(p)$ at no additional cost. Here, we complete the picture, and show that local experimentation in fact induces negligible excess cost as we approach the mean-field limit. In general, a platform that randomizes payments around $p_{t}$ will make lower profits than one that just pays everyone $p_{t}$ ;555This is because randomization will not affect active supply size to first order, but suppliers randomized to higher payments are more likely to be active. Randomization thus increases the average per-unit payment the platform needs to give to suppliers without increasing the amount of demand the platform is able to serve. the result below, however, shows that this excess cost decays quadratically in the magnitude of payment perturbations $\zeta$ .

Theorem 9.

Under the conditions of Theorem 6 there are constants $C,\,\alpha>0$ such that

[TABLE]

Recall that, as the market size gets large, Theorem 6 enables us to estimate gradients of $u_{a}(p)$ in large- $n$ markets using an amount of randomization that scales as $n^{-\alpha}$ for some $0<\alpha<0.5$ . Combined with Theorem 9, this result implies that we can in fact estimate gradients of $u_{a}(p)$ “for free” via local experimentation when $n$ is large, and that the regret of a platform deploying our platform matches to first order the regret of an oracle who was able to run first-order optimization on the mean-field limit.

4.4 Comparison with Rates for Global Experimentation

As discussed above, our local experimentation approach makes two departures from the classical literature on experimental design under interference, including Aronow and Samii (2017), Athey et al. (2018), Baird et al. (2018), Basse et al. (2019), Eckles et al. (2017), Hudgens and Halloran (2008), Leung (2020), Manski (2013), Sobel (2006) and Tchetgen Tchetgen and VanderWeele (2012). First we use mean-field equilibrium modeling to capture and correct for interference effects; second, we operationalize our approach in a dynamic setting where a decision maker wants to tune a decision variable while controlling realized regret while learning.

To highlight the value of mean-field equilibrium modeling, we compare our result from Theorem 7 to what can be achieved via the global experimentation baseline that is tailored to sequential decision making, but does not use equilibrium modeling: Each day $t=1,\,...,\,T$ , global experimentation chooses a payment $p_{t}$ given to all workers on that day, and the observes the corresponding reward $U_{t}$ . Analogously to the random saturation design discussed in Baird et al. (2018) and Hudgens and Halloran (2008), global experimentation does not suffer any bias due to interference because there is no cross-day interference in our model. The downside of global experimentation is that, unlike our equilibrium modeling based approach, it does not provide the analyst any direct information about gradients $u^{\prime}_{A_{t}}(p_{t})$ , and this severely limits the ability of global experimentation to effectively discover a good choice of $p$ .

To understand the limits of global experimentation we turn to the literature on continuous-armed bandits (or zeroth-order optimization), which has established strong lower bounds for closely related problems. Shamir (2013) considers the following setting: We have a sequential decision making problem where, in each time period, the analyst gets to choose $p_{t}$ from a bounded interval $I$ and observes a reward $U_{t}$ with $\mathbb{E}\left[U_{t}\,\big{|}\,p_{t}\right]=u(p_{t})$ and $\operatorname{Var}\left[U_{t}\,\big{|}\,p_{t}\right]=1$ ; the goal is to choose a sequence $p_{t}$ that makes the regret $\sum_{t=1}^{T}(u(p^{*})-u(p_{t}))$ small, where $p^{*}$ is the maximizer of $u(\cdot)$ over the interval $I$ . Shamir (2013) then shows that, even if $u(\cdot)$ is strongly concave, no algorithm can achieve expected regret that grows slower than $\sqrt{T}$ ; and, in fact, this result holds even if $u(\cdot)$ is known a priori to be a quadratic with unit curvature. Further results in this line of work are given in Bubeck et al. (2017). We also note closely related results by Keskin and Zeevi (2014) who establish a $\sqrt{T}$ lower bound on regret for pricing under a linear demand model (note that, with linear demand, the seller’s profit is quadratic), and by Nambiar et al. (2019), who propose a global experimentation scheme driven by random perturbations to $p_{t}$ that could be used to achieve $\sqrt{T}$ regret in our model.

The upshot is that, when the daily reward functions $u_{A_{t}}(p_{t})$ are strongly concave and there is meaningful cross-day noise due to $A_{t}$ , our approach can achieve cumulative regret on the order of $\log(T)$ (corresponding to a $1/t$ rate of decay in errors), whereas global experimentation cannot improve over $\sqrt{T}$ regret (corresponding to a $1/\sqrt{t}$ rate of decay in errors). In other words, our ability to use mean-field modeling to leverage small-scale payment variation within (rather than across) time periods enables us to fundamentally alter the difficulty of the problem of learning the optimal $p$ , and to improve our rate of convergence in $T$ .

Finally, we note that the well-known slow rates of convergence for continuous-armed bandits have led some authors to studying a query model where we can evaluate the unknown functions $u_{A_{t}}(\cdot)$ twice rather than once; for example, Duchi et al. (2015) show that two function evaluations can result in substantially faster rates of convergence than one. The reason for this gain is that, given two function evaluations, the analyst directly cancel out the main effect of the global noise term $A_{t}$ . In our setting, it is implausible that a platform could carry out such paired function evaluations in practice unless, e.g., they simultaneously run experiments across two identical twin cities. But in this paper, we found that—by leveraging structural information and mean-field modeling—local experimentation can be used to obtain similar gains over zeroth-order optimization as one could get via twin evaluation.

5 Generalizations and Limitations

So far, we have focused our discussion on a specific a model of a centralized market for freelance labor; but, as outlined in the introduction, we expect the general principles outlined here to be more broadly applicable. A full theory of experimental design powered by mean-field equilibria is beyond the scope of this paper. In this section, however, we take a first step towards a more general theory by presenting two problem settings of considerable practical interest that are amenable to our approach, risk-averse suppliers and surge pricing, and discuss another problem, immunization via vaccines, that does no appear to be amenable to it.

5.1 Equilibrium Modeling via Generalized Earning Functions

In our motivating model for freelance labor, we considered a setting where the platform first chooses a distribution $\pi$ and then, for each supplier $i$ , draws $P_{i}\sim\pi$ and promises to pay the supplier $P_{i}$ per unit of demand served; the supplier computes $q_{A}(\pi)$ , the expected number of units of demand they will get to serve if they join the market; finally, each supplier compares their expected revenue $P_{i}q_{A}(\pi)$ to their outside option and chooses whether or not to join the marketplace. Our main results were that:

In large markets, we can unobstrusively estimate a marginal response function via local experimentation;
The behavior of this marketplace can be characterized by a mean-field limit;
In the mean-field limit, we can transform estimates of the marginal response function into predictions of the effect of policy-relevant interventions. Thus, in large markets, we can use local experimentation for optimizing platform choices.

Here, we briefly discuss how to extend our approach to allow for risk-averse suppliers and surge pricing. In order to do so, we first define choice models for both problems below, and write down balance conditions generalizing (3.6). Afterwards, we conjecture the existence and form of a mean-field equilibrium, and show that the conjectured equilibrium model lets us again map from consistent estimates of a marginal response function to relevant counterfactual predictions—using the same recipe as deployed in the rest of this paper. As discussed further below, what enables us to extend our discussion to these problems is that, in both cases, we can explain the choices of suppliers in terms of a unifying formalism we refer to as generalized earning functions.

Example 10 (Risk Aversion).

Under risk aversion, supplier utility functions may not scale linearly with their revenue, and instead there is a concave function $\beta$ such that the relevant quantity for understanding the suppliers’ choices is the expectation of $\beta(\text{revenue})$ (Holt and Laury, 2002; Pratt, 1978). Suppose that $\beta(0)=0$ , and that each worker can serve 0 or 1 units of demand.666Generalizations to workers who can serve many units of demand are immediate, at the expense of more involved notation. Then our balance condition (3.6) becomes

[TABLE]

The curvature of the function $\beta(\cdot)$ thus corresponds to the degree of a supplier’s risk aversion, and setting $\beta(p)=p$ recovers our original risk-neutral model.

Example 11 (Supply-Side Surge Pricing).

Several prominent ride sharing platforms deploy surge pricing where, in case of heavy demand, the platform applies a multiplier (generally greater than $1$ ) to the original payment in order to encourage higher supplier participation (Cachon et al., 2017; Hall et al., 2015). As a simple model, suppose that surge is triggered automatically based on the supply-demand ratio, i.e., there is a function $s:\mathbb{R}_{+}\rightarrow\mathbb{R}_{+}$ such that, in each period, the $i$ -th supplier gets paid $s(D/T)P_{i}$ per unit of demand served. Suppliers can anticipate surge and, as in the rest of the paper, they make decisions based on limiting values of all random variables. Thus, suppliers anticipate payments $s(d_{a}/\mu_{a}(\pi))P_{i}$ , resulting in a balance condition

[TABLE]

where again $s(x)=1$ recovers our original model.

In both examples above, we conjecture that—in analogy to Lemma 4—a mean-field limit exists and that it can be characterized by analogues of (5.1) and (5.2) but without the $n$ -superscripts. In this case, we can write both mean-field limits in a unified form via generalized earning functions, $\theta:\mathbb{R}_{+}^{2}\to\mathbb{R}_{+}$ , so that the asymptotic balance condition is

[TABLE]

In the case of (5.1), we have $\theta_{risk}(p,\,q)=\beta(p)q$ . Meanwhile, for (5.2), recall that in the mean-field limit the matching of supply and demand is characterized by the identity $q_{a}(\mu_{a}(\pi))=\omega(d_{a}/\mu_{a}(\pi))$ . Thus, our conjecture means that (5.2) converges to (5.3) with generalized earning function $\theta_{surge}(p,\,q)=pqs(\omega^{-1}(q))$ .

We close this section by carrying out “step 3” of the analysis outlined in the first paragraph of this section, i.e., by showing how (5.3) lets us map from a marginal response function to utility gradients with respect to surge; we leave verification of the conjectured convergence to (5.3) for further work. To this end, fix $a\in\mathcal{A}$ . First, it is not difficult to show that the changes caused by the introduction of $\theta(\cdot)$ affects the computation of utility derivative $u_{a}^{\prime}(p)$ only through the expression for $\mu_{a}^{\prime}(p)$ (cf. the proof of Proposition 12). Hence, we here only focus on expressions for $\mu_{a}^{\prime}(p)$ .

Now, we can directly check that a reduced form expression as in (2.1) allows us to estimate the following marginal response function via local randomization,

[TABLE]

where $(\nabla\theta)_{i}(\cdot,\cdot)$ denotes the $i$ -th coordinate of the gradient of $\theta$ . Meanwhile, an argument based on the chain rule similar to that in the proof of Lemma 4 shows that

[TABLE]

Note, furthermore, that all quantities in (5.5) except $\Delta_{a}(p)$ are either known a-priori or can be estimated via observed averages. The upshot is that the mean-field equilibrium characterized by (5.3) enables us to map an easy-to-estimate marginal response function to $\mu_{a}^{\prime}(p)$ via (5.5). These estimates of $\mu_{a}^{\prime}(p)$ can then be directly used to compute utility gradients $u_{a}^{\prime}(p)$ that can be used for first-order optimization.

5.2 Interference and Choice Modeling

Although our approach to interference via equilibrium modeling provides useful insights in many problems of interest, it does not unlock all problems where we want to understand the effects of deploying an intervention at scale in a large system. One prominent example to which our approach does not (at least obviously) apply pertains to the study of vaccine effectiveness in the presence of herd immunity (Hudgens and Halloran, 2008; Ogburn and VanderWeele, 2017).

Example 12 (Vaccine Effectiveness).

We are considering whether to enact a policy that would increase vaccination rates against a contagious disease in a population where only a moderate fraction of people are currently vaccinated. Due to the interaction among people within the same geographical vicinity, the risk of infection for any given individual not only depends on whether they are vaccinated themselves, but also on the overall fraction of infected individuals in the ambient population (which in turn is modulated by the overall fraction of vaccinated individuals). Thus simple randomized controlled trials cannot be used to consistently estimate the effect of policies that increase the overall rate of vaccination, and instead methods that explicitly account for interference are required.

The classical way to think about experiments for community-level vaccine immunity is to randomize the fraction of people vaccinated across different disjoint (and thus non-interfering) communities (Baird et al., 2018; Hudgens and Halloran, 2008). This approach is directly analogous to the global experimentation baseline considered throughout this paper, and naturally leads to the question of whether our approach could be used to design more powerful alternatives.

In analogy to notation used in the rest of the paper, index communities by $t$ and people within communities by $i$ , and let $Z_{it}\in\left\{0,\,1\right\}$ denote whether the $i$ -th person in the $t$ -th community gets infected. We write $\mu(p)$ for the expected fraction of people who get infected in a community in which a fraction $p$ of people are vaccinated, and focus on estimating $d\mu(p)/dp$ , i.e., the decrease in the overall infection rate that can be achieved by increasing the vaccination probability. In this context, global experimentation seeks to estimate $\mu(p)$ by randomly assigning a single vaccination probability $p_{t}\in[0,\,1]$ to each community, so that each person in community $t$ gets (randomly) vaccinated with probability $p_{t}$ . In contrast, a local experimentation might consider using individualized randomization probabilities $p_{it}\in[0,\,1]$ to get a better handle on $d\mu(p)/dp$ .

At first glance, the problem may not appear so different from our leading example. It seems plausible that the above model sketch could be formalized in a way that makes it amenable to mean-field asymptotics. Furthermore, in this setting, using symmetric perturbations $p_{it}=p_{t}\pm\zeta\varepsilon_{it}$ and regressing $Z_{it}$ on $\varepsilon_{it}$ should recover a well-defined marginal response function $\Delta(p)$ that, under regularity conditions, corresponds exactly to what’s called the (average) direct effect in the statistics literature (Hudgens and Halloran, 2008; Sävje et al., 2017).

At this point, however, we appear to get stuck: Unlike in the main examples considered in this paper, there does not seem to be a natural way to map from $\Delta(p)$ to our main quantity of interest, namely $d\mu(p)/dp$ . In the case of modeling freelance labor, we assumed that suppliers only care about expected revenue; thus, once $\Delta(p)$ gave us a handle on how they react to changes in expected revenue due to the platform directly changing $p_{it}$ , we were also able to reason about how they might react to changes in expected revenue due to changes in marketplace conditions that arise from general equilibrium effects. In the case of vaccine effectiveness, however, there’s no a-priori obvious way to connect the direct effect of vaccinating a specific person to how the same person will react to a change in the overall fraction of the population that’s infected. For example, there is presumably a positive association between how much different people benefit from the vaccine directly and how much they benefit from it via herd immunity; however, some people may not be responsive to the vaccine and so have zero direct effect, but will still benefit indirectly from the vaccine via herd immunity. Thus, there appears to be no way to credibly learn about vaccine effectiveness without considering exogenous variation in the fraction of the population that’s infected.

An interesting conceptual distinction between all the positive examples presented in this paper and the above vaccination example is that, in the former, interference effects are fundamentally due to choices made by participants in the system. For example, in the case of our model for freelance labor, interference effects arise because suppliers choose not to participate in marketplaces that are too congested. In contrast, in the vaccination example, getting sick isn’t a choice; it’s simply a random event whose probability can be modulated up or down via different vaccination policies and community-level infection levels. The fact that joining a marketplace is a choice whereas getting sick is not may not matter from the point of view of mean-field asymptotics; however, making assumptions about how suppliers make choices is what let us credibly connect $\Delta(p)$ with $d\mu(p)/dp$ and proceed with our approach. The role of choice versus pure chance in understanding best practices for statistical estimation has been the topic of a longstanding discussion at the intersection of economics and statistics (e.g., Heckman, 2001; Imbens, 2014; Roy, 1951); and, in this context, our result can be seen as one example where simple choice modeling helps motivate a powerful approach to statistical inference and learning.

6 Simulation Results

We now consider a more comprehensive empirical evaluation of the performance of local versus global experimentation, building on in the simulation results of Section 2, and compare mean performance of local experimentation and global experimentation across 1,000 simulation replications. Local experimentation is run for 200 steps, exactly as described in Section 2, with a random initialization $p_{1}\sim\text{Unif}(10,\,30)$ . Meanwhile, for global experimentation, we consider a collection of strategies that first randomly draw payments $p_{t}\sim\text{Unif}(10,\,30)$ for the first $1\leq t\leq T$ time periods, fit a spline to the data (as in the left panel of Figure 4), and then deploy the learned policy for the remaining $200-T$ time periods. We consider the choices $T\in\left\{40,\,60,\,80,\,\ldots,\,200\right\}$ . For both methods, we report both in-sample regret, i.e., the mean utility shortfall relative to deploying the population-optimal $p^{*}$ for the $T$ learning periods, as well as future expected regret, i.e., the expected utility shortfall from deploying the learned policy $\hat{p}$ after the $T$ learning periods. For local experimentation, we set $\hat{p}=2\sum_{t=1}^{t}t\,p_{t}\,/\,(T(T+1))$ following Corollary 8, whereas for global experimentation we set $\hat{p}$ to be the output of spline optimization discussed above.

As seen in the left panel of Figure 6, local experimentation outperforms global experimentation by an order of magnitude along both metrics. Quantitatively, local experimentation achieved mean in-sample regret of 0.025 and mean future regret of 0.0045. In contrast, the best numbers achieved by global experimentation for these metrics were 0.57 and 0.12 respectively—and there was not a single choice of tuning parameters that achieved both. In general, we see that a larger choice of $T$ always improves future regret, whereas for in-sample regret there is an optimal middle ground that balances exploration and exploitation (here, $T=80$ ).

Next, we consider an analogous simulation design, but with supply-side surge pricing. As discussed in Section 5, we assume that the platform makes a public commitment to mechanistically increase supply side payments by a multiplicative factor $s(D/T)$ once the demand $D$ and supply $T$ are realized, and suppliers take this commitment into account when choosing whether or not to join the marketplace. Here, we use

[TABLE]

meaning that, by the properties of $\omega(\cdot)$ as outlined in Definition 5, the surge multiplier is 1 when $D$ is small relative to $T$ , but eventually climbs up to the ratio $D/T$ as demand outpaces supply. This choice of $s(\cdot)$ is by no means optimal; it is simply an example.

As discussed in Section 5, our analysis of surge relies on a conjecture that relevant properties of mean-field limits as discussed in Section 3.1 still hold with surge. We work with a limiting platform utility function that depends linearly on revenue minus costs as in Lemma 3,

[TABLE]

As discussed above, we can estimate the $p$ -derivative of the expected scaled active supply size, $\mu_{a}^{\prime}(p)$ , by local experimentation via (5.4) and (5.5). Moreover, following the argument of Theorem 6, we obtain $p$ -derivatives of $u_{a}(p)$ via

[TABLE]

We turn this into a feasible estimator plugging in our local experimentation estimates of $\hat{\mu}_{a}^{\prime}(p)$ for $\mu_{a}^{\prime}(p)$ , and estimating the ratio $d_{a}/\mu_{a}(p)$ via its sample analogue $D/T$ .

Results for learning $p$ are given in the right panel of Figure 6. Qualitatively, the results match those obtained without surge, and local experimentation still outperforms global experimentation by an order of magnitude. Local experimentation achieved mean in-sample regret of 0.013 and mean future regret of 0.0024, while the best corresponding numbers achieved by global experimentation for these metrics were 0.63 and 0.29 respectively. We also not that adding the automatic surge multiplier as in (6.1) decreased the optimal base payment from 17.6 to 15.7, while increasing optimal mean platform utility by 0.06 (the median utility difference is 0.04). Thus, in this example, the regret of global experimentation is much larger than the utility gain from using surge as in (6.1) relative to not using surge—whereas the regret of local experimentation is less than the effect of adopting surge.

Finally, we note that the global experimentation baseline considered here—namely our two-phase algorithm that starts with pure exploration and then moves to pure exploitation—is fairly simple, and it is possible that a more sophisticated global experimentation baseline could somewhat improve performance. However one can check that, under reasonable conditions and provided we explore for the first $\sqrt{T}$ periods, our implemented baseline attains the optimal $\sqrt{T}$ regret rate of Shamir (2013) discussed in Section 4.4. Thus, more sophisticated methods like Bayesian zeroth-order optimization777One potentially promising approach would be to use local experimentation to get gradient estimates $u^{\prime}_{A_{t}}(p_{t})$ , and then incorporate these estimates into a Bayesian learning framework. It is plausible that this could yield practically meaningfully improvements over the first-order approach considered in this paper. as considered in, e.g., Letham et al. (2019) may improve on finite sample performance but cannot improve on the overall regret rate of our baselines.888Another class of popular continuous-armed bandit algorithms were introduced by Flaxman et al. (2005) and Kleinberg (2005). These methods estimate derivatives by noisy function evaluations and then use these for gradient descent. However, while desirable due to their transparency and ease, these methods suffer cumulative regret on the order of $T^{3/4}$ in our setting. In our simulations, this class of methods performed worse than the global experimentation baseline we report results for.

7 Discussion

We introduced a new framework for experimental design in stochastic systems with significant cross-unit interference. The key insight is that, in certain families of models, inference is structured enough to be captured by a small number of key statistics, such as the global demand-supply equilibrium, and the impact of interference can be subsequently accounted for using mean-field and equilibrium modeling. We then proposed an approach based on local experimentation that would allow us to accurately and efficiently estimate the utility gradient in the large-system limit, and use these gradient estimates to perform first-order optimization.

There are some simplifying assumptions we make in this work that can be relaxed or verified in future research. For instance, we have assumed that the demand is exogenous. We expect that an extension of our method can be used to capture scenarios where the demand may, for instance, depend on the supply level: For example, a passenger may be less likely to hail a ride if they know there would be a long wait. Another assumption we made is that the market equilibrium can be reached relatively quickly. While there are recent empirical evidence suggesting that drivers in a ride-sharing platform do respond to payment changes in manner that takes into account the resulting market equilibrium (Hall et al., 2019), it would be interesting to consider a more realistic model where prices may be updated continuously before a new market equilibrium is fully reached. It is less clear how the current model would apply in this setting, which is likely to require a substantially more sophisticated analysis.

We believe that the general approach proposed in this paper, one that leverages mean-field modeling in experimental design, has the potential to be applicable in a wider range of problems. As one example, we may consider models in which the key statistics that capture the interference patterns are multi-dimensional. This could occur in a marketplace which, instead of being fully centralized, consists of a small number of inter-connected sub-markets. For instance, in a ride-sharing platform, the sub-markets may correspond to neighboring cities connected by highways and bridges. In these systems, suppliers’ behaviors remain to be primarily influenced by the local supply-demand equilibrium in their respective sub-markets. These local equilibria in turn interact with one another due to network effects. Nevertheless, in a large-market regime where the numbers of market participants are relatively large in all sub-markets, while the total number of sub-markets remains the same, we may still use the type of mean-field asymptotics in this paper to account for the interference across both individuals units and sub-markets to efficiently estimate the effect of payment adjustments. In another direction, we may extend the one-shot equilibrium model adopted in this paper to dynamic settings where the equilibrium emerges gradually gradually according to a stochastic process (e.g., suppliers may adapt to payment variations only over time), and study whether a dynamic version of our mean-field model can be used to analysis the effects of local experimentation in these systems. Finally, it would be interesting to investigate whether the local experimentation scheme proposed in this paper can be generalized to estimate higher-order derivatives of the utility function.

Appendix A Proofs of Main Results

A.1 Proof of Lemma 1

Fix $a\in\mathcal{A}$ . Recall that all suppliers know the realization of the global state, $a$ , and the probability that a given supplier will choose to become active is given by (3.6). Define

[TABLE]

That is, $\psi^{(n)}_{a}(\mu,\pi)$ is the probability of a supplier becoming active under the payment distribution $\pi$ , if they believe that the active supply size is Binomial $(n,\,\mu)$ . Note that, since the same payment distribution applies uniformly across all suppliers, so are the probabilities of the suppliers becoming active. As a result, for any $\pi$ , the actual active supply size $T$ will follow a Binomial distribution. In particular, this implies that $T$ is an equilibrium active supply size if and only if it is binomial with mean $\mu$ that satisfies the following fixed-point equation:

[TABLE]

It suffices to show that (A.2) admits a unique solution in the domain $\mu\in[0,1]$ . Because $f_{b}(\cdot)$ is by construction non-decreasing, it follows that $\psi^{(n)}_{a}(\mu,\pi)$ is a continuous function and non-increasing in $\mu$ : A supplier is more discouraged from becoming active, if they believe there will be more active suppliers in the market eventually. In particular, the left-hand side of (A.2), $\psi^{(n)}_{a}(\mu,\pi)$ , is a non-negative, continuous and non-increasing function over $\mu\in[0,1]$ , which implies that (A.2) admits a unique solution in $[0,1]$ .

A.2 Proof of Lemma 2

Our argument will leverage the following simple expression for the limiting derivative of $q(\cdot)$ , the proof of which follows immediately from a generalization of a classical result of Stein [1981] to exponential families; the proof is given in Appendix B.3.

Proposition 10.

Fix $a\in\mathcal{A}$ and $\mu>0$ . Then, $\frac{d}{d\mu}q^{(n)}_{a}(\mu)$ is non-positive, and

[TABLE]

The claim in (3.14) follows directly from the definition of $\Omega$ and Assumption 1, i.e., that $\Omega(d,t)$ converges to $\omega(d/t)$ as $t\to\infty$ , and the fact that conditional on $A=a$ , $D/n$ concentrates on $d_{a}$ as $n\to\infty$ . For (3.13), recall that $\mu^{(n)}_{a}(p)$ is the solution to the balance equation in (A.2): $\mu=\mathbb{E}\left[f_{B_{1}}(p\,q^{(n)}_{a}(\mu))\,\big{|}\,A=a\right]$ . By (3.14), and the monotonicity of the functions $f_{b}(\cdot)$ and $\omega(\cdot)$ , we have that $\mu^{(n)}_{a}(p)$ converges to $\mu_{a}(p)$ as $n\to\infty$ , where $\mu_{a}(p)$ is the solution to the limiting balance equation given in the statement of Lemma 2. The claim in (3.15) follows from (3.13), (3.14) and Assumption 3. The convergence of $q^{(n)^{\prime}}_{a}(\mu)$ in (3.16) follows from Proposition 10.

A.3 Proof of Lemma 4

We start by verifying (3.18). By (A.2) and the chain rule, we have that

[TABLE]

The last expression above is linear in $(\mu^{(n)}_{a})^{\prime}(p)$ . Re-arranging the equation and solving for $(\mu^{(n)}_{a})^{\prime}(p)$ leads to the desired result. Finally, (3.19) is a direct consequence of Lemma 2, while (3.20) follows by combining from (3.18) with (3.19) and Lemma 2.

A.4 Proof of Theorem 6

The proof will make use of the following two technical results. The first concerns the sensitivity of the system dynamics with respect to small perturbations $\zeta$ , and the second extends calculations from Section 3.1 to the utility function $u^{(n)}_{a}(p)$ . The proofs of these results are given in Appendices B.4 and B.5, respectively. Recall that $\mu^{(n)}_{a}(p,\zeta)$ is the expected fraction of active suppliers in equilibrium when the payments are $\zeta$ -perturbed from $p$ , i.e., $\mu^{(n)}_{a}(p,\zeta)\triangleq\mathbb{E}\left[T(p,\zeta)/n\,\big{|}\,A=a\right]$ .

Proposition 11.

Fix $p>0$ , $a\in\mathcal{A}$ and $n\in\mathbb{N}$ . $\mu^{(n)}_{a}(p,\zeta)$ and $q^{(n)}_{a}(p,\zeta)$ are twice differentiable functions with respect to $\zeta$ , and satisfy:

$\left\{\frac{\partial}{\partial\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=0}=\left\{\frac{\partial}{\partial\zeta}q^{(n)}_{a}(\mu^{(n)}_{a}(p,\zeta))\right\}_{\zeta=0}=0$ * for all $n\in\mathbb{N}$ .* 2. 2.

There exists $\alpha>0$ such that $\left\{\frac{\partial^{2}}{\partial^{2}\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=\zeta_{0}}$ and $\left\{\frac{\partial^{2}}{\partial\zeta^{2}}q^{(n)}_{a}(\mu^{(n)}_{a}(p,\zeta))\right\}_{\zeta=\zeta_{0}}$ are bounded uniformly over all $\zeta_{0}\in(0,\alpha)$ and $n\in\mathbb{N}$ .

Proposition 12.

Fix $p>0$ and $a\in\mathcal{A}$ . We have that

[TABLE]

where $u_{a}(\cdot)$ is defined in (3.15).

Now, recall that we are considering the case where the platform employs an $\eta$ -perturbed payment distribution $\pi_{p,\eta}$ , with $P_{i}=p+\eta\varepsilon_{i}$ ((3.11)). Define the estimators

[TABLE]

so that $\widebar{D}$ and $\widebar{Z}$ correspond to the scaled demand and active suppliers, respectively, and $\widehat{\Delta}$ is the scaled regression coefficient of $Z_{i}$ on $\varepsilon_{i}$ . Finally, define the estimator

[TABLE]

Our main remaining task is to show that, under the stated conditions,

[TABLE]

in $L_{2}$ as $n\to\infty$ , for any $a\in\mathcal{A}$ . In light of Proposition 12, the desired conclusion (4.4) then follows immediately by combining (4.3) with (A.7) with (12) and invoking Slutsky’s lemma.

We now turn to proving (A.7). First, we note that the fact that $\widehat{\Upsilon}\to\mu_{a}^{\prime}(p)$ follows directly by combining the first three convergence claims in (A.7) with Lemma 4. The fact that $\widebar{Z}\to d_{a}$ follows from our definition (3.1). For $\widebar{Z}\to\mu_{a}(p)$ , note that by Chernoff bound we know that $\widebar{Z}$ concentrates on $\mu^{(n)}_{a}(p,\zeta_{n})$ as $n\to\infty$ . Furthermore, we have that

[TABLE]

where steps $(a)$ and $(b)$ follow from Lemma 2 and Proposition 11, respectively. Together, this shows that $\widebar{Z}\to(p)$ in $L_{2}$ .

Finally, it remains to show that $\widehat{\Delta}\to\Delta_{a}(p)$ . To this end, we first observe the following fact: There exists a constant $C>0$ such that, for every $\zeta$ and $n$

[TABLE]

To prove (A.9), note that given $\zeta$ -perturbed payments $P_{i}=p+\zeta\,\varepsilon_{i}$ , we have

[TABLE]

We can then take the limit $\zeta\rightarrow 0$ , and verify that there exists $C>0$ such that

[TABLE]

Here, we used Proposition 11, and specifically the fact that $\left\{\frac{\partial}{\partial\zeta}q_{a}^{(n)}\left(\mu^{(n)}_{a}\left(p,\,\zeta\right)\right)\right\}_{\zeta=0}=0$ , both $f_{B_{i}}(\cdot)$ and $q_{a}^{(n)}(\mu^{(n)}_{a}\left(p,\,\cdot\right))$ are twice differentiable with bounded second derivatives uniformly over $n$ , and $\varepsilon_{i}$ has variance 1. Finally, by Slutsky’s lemma and conditionally on $A=a$ , we have

[TABLE]

As $\operatorname{Var}_{n}\left[\varepsilon_{i}\,\big{|}\,A=a\right]=1$ , we conclude that $\widehat{\Delta}\to_{p}\Delta_{a}(p)$ using (A.9) and Lemma 4.

A.5 Proof of Theorem 7

Given the form of (4.5), we can use Lemma 1 of Orabona et al. [2015] to check that

[TABLE]

We then can replace the gradient estimates $\widehat{\Gamma}_{t}$ with their mean-field limits $u^{\prime}_{A_{t}}(p_{t})$ provided we add appropriate error terms as follows,

[TABLE]

Then, given the result in Theorem 6 we see that, for any $\varepsilon>0$ ,

[TABLE]

with probability tending to 1 as $n$ gets large. Noting that $\left\lvert u^{\prime}_{A_{t}}(p_{t})\right\rvert<M$ , this simplifies to

[TABLE]

with probability tending to 1. The desired statement (4.6) follows by leveraging the remaining assumptions from the theorem statement: $\sigma$ -strong concavity of $u_{A_{t}}(\cdot)$ implies that

[TABLE]

and we use the above to replace the left-hand side expression of (A.12) while noting that $\sigma>\eta^{-1}$ .

A.6 Proof of Corollary 8

Let $p^{*}$ be the maximizer of $u(\cdot)$ over $I=[c_{-},\,c+]$ . By (4.6) we have

[TABLE]

Paired with strong concavity of $u(p)$ around $p^{*}$ and the fact that $u^{\prime}(p^{*})=0$ , this implies

[TABLE]

In order to verify the desired result, our next step is to bound $Z_{T}$ . First, because $p_{t}$ is chosen before we get to learn about $A_{t}$ , $Z_{t}$ is a martingale. Second, because the derivative of $u_{a}(p)$ is uniformly bounded by $M$ , we have $\left\lvert Z_{t}-Z_{t-1}\right\rvert\leq 2Mt\left\lvert p_{t}-p^{*}\right\rvert$ for all $t$ . Thus, using Hoeffding’s lemma to bound the moment-generating function of a bounded random variable, these two facts together imply that

[TABLE]

and so

[TABLE]

is a super-martingale for any $c>0$ . Thus, by Markov’s inequality,

[TABLE]

for any $0<\delta<1$ . Pairing (A.15) and (A.16) with $c=\sigma/(2M^{2}T)$ then yields (recall that $\eta>\sigma^{-1}$ )

[TABLE]

Finally, the desired result follows by noting that

[TABLE]

A.7 Proof of Theorem 9

Using Proposition 11 and a first-order Taylor expansion with a Lagrange-form remainder, we immediately see that there is a $C>0$ such that, for all $n\geq n_{0}$ and $0\leq\zeta<\alpha$ ,

[TABLE]

Since this bound holds for all $n\geq n_{0}$ , it also holds in the limit $n\rightarrow\infty$ , thus implying (4.8).

Appendix B Proofs of Technical Results

B.1 Proof of Lemma 3

Fix $a\in\mathcal{A}$ . It follows from Lemma 2 that

[TABLE]

where

[TABLE]

We have that

[TABLE]

Note that $q_{a}(\mu_{a}(p))\mu_{a}(p)$ is the normalized amount of demand that ends up being served. The next result shows that $q_{a}(\mu_{a}(p))\mu_{a}(p)$ is non-decreasing in the payment $p$ ; The proof is given in Appendix B.6.

Proposition 13.

There exists $c>0$ , such that

[TABLE]

Because $p<\gamma$ , in light of Proposition 13, in order to show that $u_{a}(\cdot)$ is strictly concave, it suffices to demonstrate that

[TABLE]

To this end, we have that

[TABLE]

where steps $(a)$ and $(b)$ follow from the fact that $q_{a}(\mu)=\omega(d_{a}/\mu)$ . Because $\omega(\cdot)$ is concave, it follows that the second term in (B.6) is non-positive. Furthermore, recall from Definition 5 that $\omega(\cdot)$ is concave and $\omega(0)=0$ . By Proposition 5, we have that

[TABLE]

In the remainder of the proof, we will focus on showing that

[TABLE]

which would imply the strong concavity of $u_{a}(\cdot)$ .

Recall that, by construction, the average choice function $f_{a}(\cdot)$ is non-decreasing and concave. Recall from (A.2) that $\mu_{a}(p)$ satisfies the fixed-point equation:

[TABLE]

where

[TABLE]

Twice-differentiating (B.9) with respect to $p$ , we obtain that

[TABLE]

where $\mathbf{H}_{\psi_{a}}(\cdot,\cdot)$ denotes the Hessian of $\psi_{a}(\cdot)$ . This leads to

[TABLE]

Note that since $f$ and $q$ are non-increasing, we have that $\left\{\frac{\partial}{\partial\mu}\psi_{a}(\mu,p)\right\}_{\mu=\mu_{a}(p)}\leq 0$ . It remains to verify that

[TABLE]

Recall that $\psi_{a}(\mu,p)=f_{a}(pq_{a}(\mu))$ . We have that

[TABLE]

Rearranging terms, and using the fact that $q_{a}(\mu)=\omega(d_{a}/\mu)$ , we can decompose $\mathbf{H}_{\psi_{a}}(\mu,p)$ as follows:

[TABLE]

where

[TABLE]

We make the following observations concerning the three terms in (B.15). For the first term, $\mathbf{A}$ is the outer product of $(pq_{a}^{\prime}(\mu),\,q_{a}(\mu))$ with itself and is hence positive semi-definite. Since $f_{a}(\cdot)$ is concave and hence $f^{\prime\prime}_{a}(\cdot)<0$ , we have that $f^{\prime\prime}_{a}(pq_{a}(\mu))\mathbf{A}$ is negative semi-definite, i.e.,

[TABLE]

For the second term, note that $f^{\prime}_{a}(pq_{a}(\mu))>0$ and $\omega^{\prime\prime}(\cdot)<0$ due to the concavity of $\omega(\cdot)$ . Therefore, we have that

[TABLE]

For the third term, since we are only interested in the properties of $\mathbf{C}$ along the specific direction $(\mu_{a}^{\prime}(p),\,1)$ , it suffices to show that when $\mu=\mu_{a}(p)$ , $(\mu_{a}^{\prime}(p),1)\mathbf{C}(\mu_{a}^{\prime}(p),1)^{\intercal}$ is non-positive. This claim is isolated in the form of the following proposition; The proof is given in Appendix B.7.

Proposition 14.

[TABLE]

By combining (B.16), (B.17) and Proposition 14, we have proven (B.13), i.e.,

[TABLE]

This in turn proves Lemma 3. ∎

B.2 Proof of Proposition 5

We have that

[TABLE]

where $(a)$ follows from the assumption that $g(0)\geq 0$ , and $(b)$ from the concavity of $g(\cdot)$ , which implies that $g^{\prime}(\cdot)$ is non-increasing over $x>0$ . For the second statement, we first note that if, for some $c>0$ ,

[TABLE]

then we have that

[TABLE]

We then conclude by arguing by contraction. Suppose that, for each $M\geq 0$ , there exists some $x\geq M$ satisfying (B.21); then, by the above argument and noting that $g(x)$ is non-negative and concave (and thus non-decreasing), we must either have $g(x)=0$ for all $x$ , or $\lim_{x\rightarrow\infty}g(x)=\infty$ . Thus, under our stated assumptions, the condition (B.21) can only hold on a finite interval for any value of $c>0$ , and so

[TABLE]

∎

B.3 Proof of Proposition 10

We start by verifying a useful property that applies to any exponential family with discrete support.

Definition 13.

Let $\{X\}$ a family of discrete random variables and parameterized by $\theta\in\Theta\subset\mathbb{R}$ . We say that $\{X\}$ is an exponential family, if the probability mass function (PMF) $f_{\theta}$ for $X$ can be expressed as

[TABLE]

where $T(\cdot)$ is referred to as the sufficient statistic, and $\eta(\theta)$ the natural parameter.

We have the following identity, which is a simple generalization of a result proved by Stein [1981] for Gaussian random variables.

Lemma 15.

Fix an exponential family of random variables $X$ with discrete support $\mathcal{X}$ parametrized by $\theta\in\Theta$ defined over a finite subset of $\mathbb{R}$ , with sufficient statistic $T(X)$ and natural parameter $\eta(\theta)$ . Then, for any function $g:\mathcal{X}\to\mathbb{R}$ , we have that

[TABLE]

for all $\theta$ in the interior of $\Theta$ .

*Proof. * First, observe that $\sum_{x}f_{\theta}(x)=h(x)\exp(\eta(\theta)T(x)-A(\theta))=1$ . Taking derivatives on both sides with respect to $\theta$ , we obtain that

[TABLE]

which implies that

[TABLE]

We have that

[TABLE]

where $(a)$ follows from (B.25), and $(b)$ from the fact that $\mathbb{E}_{\theta}\left[T(X)-\mathbb{E}_{\theta}\left[T(X)\right]\right]=0$ . This proves Lemma 15. ∎

We are now ready to prove the stated result. We will assume that all probabilities are calculated by conditioning on $A=a$ , and thus omit it from our notation. The fact that $\frac{d}{d\mu}q^{(n)}_{a}(\mu)$ is non-positive follows directly from the fact that $\Omega(d,t)$ is non-increasing in $t$ (Assumption 1). The PMF of an $(n,\,\mu)$ Binomial random variable $X$ can be written as

[TABLE]

In particular, the set of Binomial random variables forms an exponential family, with natural parameter $\eta(\mu)=\log\left(\frac{\mu}{1-\mu}\right)$ and sufficient statistic $T(X)=X$ . We now employ Lemma 15 above. Define

[TABLE]

For a fixed $d$ , we have that999The notation $x\in a\pm b$ denotes $x\in[a-b,a+b]$ .

[TABLE]

where $(a)$ follows from Lemma 15, $(b)$ from Assumption 1, and $(c)$ from the Cauchy–Schwarz inequality. Taking expectation with respect to $d\sim D$ on both sides, we obtain

[TABLE]

We next bound each of the two terms in (B.30). For the second term, recall that $l(\cdot)$ is bounded and $|l(d,t)|=o(1/\sqrt{d}+1/\sqrt{t})$ (Assumption 1). Furthermore, it follows from the Chernoff bound and (3.2), respectively, that

[TABLE]

This implies that

[TABLE]

Furthermore, note that

[TABLE]

Combining the above two equations, we conclude that

[TABLE]

Next, we turn to the first term in (B.30), which will follow from the following result. Fix $\delta\in(0,\mu)$ , and define the event

[TABLE]

Using Taylor expansion on the function

[TABLE]

and the smoothness of $\omega$ , we have that there exists a constant $c_{1}>0$ such that, for all $n$ and $d$ ,

[TABLE]

Fix $d\in\mathbb{R}_{+}$ . We have that

[TABLE]

where $c_{2}=\max_{d,x\in\mathbb{R}_{+}}\Omega(d,x),\mbox{ and }c_{3}=\max_{x\in[-\delta,\delta]}h(x)$ , and the $o(1)$ term does not depend on $d$ . Step $(a)$ is based on the Cauchy–Schwarz inequality, $(b)$ from the fact that $\mathbb{P}\left(\overline{\mathcal{E}}\right)$ converges to [math] exponentially fast in $n$ by the Chernoff bound and that $\mathbb{E}\left[H^{2}\right]=\mathcal{O}(n)$ , $(c)$ from the Taylor expansion in (B.37), and $(d)$ from the fact that $\left\lvert\mathbb{E}\left[H^{3}\right]\right\rvert=\mathcal{O}(n)$ as a result of $X$ being a Binomial random variable. Finally, step $(e)$ follows from the definition of $h$ in (B.36).

Recall from (3.1) that, conditional on $A=a$ , $D/n$ concentrates on $d_{a}$ as $n\to\infty$ . (B.38) thus implies that

[TABLE]

Substituting (B.34) and (B.39) into (B.30), we obtain that

[TABLE]

This proves Proposition 10. ∎

B.4 Proof of Proposition 11

Fix $a\in\mathcal{A}$ and $n\in\mathbb{N}$ . Denote by $\pi_{p,\zeta}$ the $\zeta$ -perturbed payment distribution centered at $p$ (3.11). We first prove that $\left\{\frac{\partial}{\partial\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=0}=0.$ By (A.2), $\mu^{(n)}_{a}(p,\zeta)$ satisfies

[TABLE]

It therefore suffices to evaluate the right-hand side of the above equation. To this end:

[TABLE]

where $(a)$ follows from the definition of $\zeta$ -perturbation ((3.11)) and the independence of perturbations $\{\varepsilon_{i}\}_{i\in\mathbb{N}}$ from the rest of the system. Since both $f_{B_{1}}(\cdot)$ and $q^{(n)}_{a}(\cdot)$ are bounded, for the first term on the right-hand side of (B.42), it is not difficult to show using the dominated convergence theorem that there exists $c>0$ such that101010Notation: $x\in y\pm z\leftrightarrow x\in[y-z,y+z]$ .

[TABLE]

for all sufficiently small $\zeta$ , where $\upsilon\triangleq\left\{\frac{\partial}{\partial\zeta}\mathbb{E}\left[f_{B_{1}}\left((p+\zeta)\,q^{(n)}_{a}(\mu^{(n)}_{a}(p,\zeta))\right)\,\big{|}\,A=a\right]\right\}_{\zeta=0}.$ Applying the same argument to the second term in (B.42), we have that there exists $c$ , such that for all sufficiently small $\zeta$

[TABLE]

which further implies that

[TABLE]

For the derivative of $q^{(n)}_{a}(\mu^{(n)}_{a}(p,\cdot))$ , note that by chain rule, we have

[TABLE]

Since $\left\{\frac{\partial}{\partial\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=0}=0$ by (B.45), and $(q^{(n)}_{a})^{\prime}(\mu^{(n)}_{a}(p,\zeta))$ is finite by Proposition 10, we have that $\left\{\frac{\partial}{\partial\zeta}q^{(n)}_{a}(\mu^{(n)}_{a}(p,\zeta)))\right\}_{\zeta=0}=0$ . This proves the first claim of Proposition 11.

For the second claim, define

[TABLE]

Applying the chain rule to (B.41), we have that

[TABLE]

Note that

[TABLE]

and

[TABLE]

By chain rule, we have

[TABLE]

where the last step follows from the fact that $\left\{\frac{\partial}{\partial\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=0}=0$ . Applying (B.46) and (B.51) to (B.49) and (B.50), we have

[TABLE]

and

[TABLE]

Substituting the expressions for $g^{\prime}_{\varepsilon_{1}}(0)$ and $g^{\prime\prime}_{\varepsilon_{1}}(0)$ into (B.48), we obtain:

[TABLE]

where the last step follows from the fact that $\varepsilon_{1}\in\{-1,1\}$ and hence $\varepsilon_{1}^{2}=1$ . After re-arrangement, the above equation yields

[TABLE]

and by (B.51), we have

[TABLE]

Finally, we check the uniform boundedness of the second derivatives with respect to all $n$ and all sufficiently small $\zeta$ . To show that $\left\{\frac{\partial^{2}}{\partial^{2}\zeta}\mu^{(n)}_{a}(p,\zeta)\right\}_{\zeta=0}$ , note that $f^{\prime}_{B_{1}}(\cdot)$ is non-negative and $(q^{(n)}_{a})^{\prime}(\cdot)$ non-positive (Proposition 10). Therefore, the term $\mathbb{E}\left[f^{\prime}_{B_{1}}(g_{\varepsilon_{1}}(0))\,\big{|}\,A=a\right](q^{(n)}_{a})^{\prime}(\mu^{(n)}_{a}(p))p$ is non-positive. By (B.55), this implies the uniform boundedness of $\frac{\partial^{2}}{\partial^{2}\zeta}\mu^{(n)}_{a}(p,\zeta)$ . Note that by Proposition 10, $(q^{(n)}_{a})^{\prime}(\mu^{(n)}_{a}(p))$ is non-positive and bounded, and with (B.56) this shows that $\frac{\partial^{2}}{\partial^{2}\zeta}q^{(n)}_{a}(\mu^{(n)}_{a}(p,\zeta))$ is bounded for all $n$ and all sufficiently small $\zeta$ . This proves the second claim and thus completes the proof of Proposition 11.

B.5 Proof of Proposition 12

Fix $a\in\mathcal{A}$ and $n\in\mathbb{N}$ . Consider the case where the payment distributions where all potential suppliers are offered a fixed payment, $p$ , i.e., $\pi=\delta_{p}$ . Recall from (3.8) and (3.10) that

[TABLE]

where $T\sim\text{Binomial}(\mu^{(n)}_{a}(p),\,n)$ . We have by the chain rule:

[TABLE]

where $X\sim\text{Binomial}(\mu,n)$ , and similarly

[TABLE]

Using arguments essentially identical to that of Proposition 10, we can show that for all $a\in\mathcal{A}$ and $\mu>0$

[TABLE]

where the limiting functions $\omega$ and $r$ are defined in Assumptions 3 and 1, respectively. Substituting (B.60) and (B.61) into (B.58) and (B.59), respectively, and observing that

[TABLE]

we have

[TABLE]

where $\mu_{a}(p)=\lim_{n\rightarrow\infty}\mu^{(n)}_{a}(p)$ is defined in Lemma 4.

[TABLE]

where $u_{a}(\cdot)$ is defined in (3.15). This recovers the desired result. ∎

B.6 Proof of Proposition 13

By the chain rule, and the fact that $q_{a}(\mu)=\omega(d_{a}/\mu)$ , we have that

[TABLE]

Using the expression for $\mu_{a}^{\prime}(p)$ (cf. (3.20)), it is not difficult to show that, as a result of the strong concavity of $f_{a}(\cdot)$ in the interval $(\underline{x},\overline{x})$ , we have that $\inf_{p\in(c_{0},p)}\mu_{a}^{\prime}(p)>0$ . Furthermore, using the same argument as in the proof of Proposition 5 and the fact that $\omega(\cdot)$ is strongly concave with $\omega(0)\geq 0$ , we have that $\inf_{p\in(c_{0},\gamma)}(\omega(d_{a}/\mu_{a}(p))-\omega^{\prime}(d_{a}/\mu_{a}(p))d_{a}/\mu_{a}(p))>0$ . Together, this implies that $\inf_{p\in(c_{0},\gamma)}\frac{d}{dp}\left(q_{a}(\mu_{a}(p))\mu_{a}(p)\right)>0$ , thus proving our claim. ∎

B.7 Proof of Proposition 14

Note that

[TABLE]

It therefore suffices to show that

[TABLE]

From Lemma 4, we have that

[TABLE]

where

[TABLE]

and

[TABLE]

Multiplying the left-hand side of (B.66) by $p/\mu_{a}(p)$ , we obtain

[TABLE]

where $(a)$ follows from Proposition 5 combined with the non-negativity and concavity of $\tilde{f}(\cdot)$ , and $(b)$ from the fact that $q_{a}^{\prime}(\mu)=-\omega(d_{a}/\mu)d_{a}/\mu^{2}\leq 0$ . This proves (B.65) and hence the proposition. ∎

Bibliography78

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadie et al. [2010] Alberto Abadie, Alexis Diamond, and Jens Hainmueller. Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American Statistical Association , 105(490), 2010.
2Adlakha et al. [2015] Sachin Adlakha, Ramesh Johari, and Gabriel Y Weintraub. Equilibria of dynamic games with many players: Existence, approximation, and market structure. Journal of Economic Theory , 156:269–316, 2015.
3Aronow and Samii [2017] Peter M Aronow and Cyrus Samii. Estimating average causal effects under general interference, with application to a social network experiment. The Annals of Applied Statistics , 11(4):1912–1947, 2017.
4Athey and Luca [2019] Susan Athey and Michael Luca. Economists (and economics) in tech companies. Journal of Economic Perspectives , 33(1):209–30, 2019.
5Athey et al. [2018] Susan Athey, Dean Eckles, and Guido W Imbens. Exact p-values for network interference. Journal of the American Statistical Association , 113(521):230–240, 2018.
6Baird et al. [2018] Sarah Baird, J Aislinn Bohren, Craig Mc Intosh, and Berk Özler. Optimal design of experiments in the presence of interference. Review of Economics and Statistics , 100(5):844–860, 2018.
7Banerjee and Duflo [2011] Abhijit Banerjee and Esther Duflo. Poor economics: A radical rethinking of the way to fight global poverty . Public Affairs, 2011.
8Basse et al. [2019] Guillaum W Basse, Avi Feller, and Panos Toulis. Randomization tests of causal effects under interference. Biometrika , 106(2):487–494, 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Experimenting in Equilibrium

Abstract

1 Introduction

Example 1** (Tuition Subsidies).**

1.1 Interference and Clustered Inference

1.2 Accounting for Interference via Equilibrium Modeling

Example 2** (Ride Sharing).**

Example 3** (Congestion Pricing).**

Example 4** (Renewable Energy Subsidies).**

1.3 Related Work

2 Designing Experiments under Equilibrium Effects

2.1 Local Experimentation

Remark 1* (Relationship to Batched Bandits).*

3 Model: Stochastic Market with Centralized Pricing

Demand

Matching Demand with Suppliers

Definition 5** (Regular Allocation Function).**

Assumption 1**.**

Example 6** (Regular Allocation Function Example: Parallel Finite-Capacity Queues).**

Supplier Choice Behavior

Assumption 2**.**

Example 7** (Logistic Choice Function).**

Platform Utility and Objective

Assumption 3**.**

Symmetric Payment Perturbation

Remark 2* (What does the platform know?).*

3.1 Mean-Field Asymptotics

Definition 8** (Active Supply Size in Equilibrium).**

Lemma 1**.**

Lemma 2**.**

Lemma 3**.**

3.2 The Marginal Response Function

Definition 9** (Marginal Response Function).**

Lemma 4**.**

Proposition 5**.**

4 Learning via Local Experimentation

4.1 Estimating Utility Gradients

Theorem 6**.**

Remark 3* (Population-wide Experimentation & Symmetric Perturbation).*

4.2 A First-Order Algorithm

Theorem 7**.**

Corollary 8**.**

4.3 The Cost of Experimentation

Theorem 9**.**

4.4 Comparison with Rates for Global Experimentation

5 Generalizations and Limitations

5.1 Equilibrium Modeling via Generalized Earning Functions

Example 10** (Risk Aversion).**

Example 11** (Supply-Side Surge Pricing).**

5.2 Interference and Choice Modeling

Example 12** (Vaccine Effectiveness).**

6 Simulation Results

7 Discussion

Appendix A Proofs of Main Results

A.1 Proof of Lemma 1

A.2 Proof of Lemma 2

Proposition 10**.**

A.3 Proof of Lemma 4

A.4 Proof of Theorem 6

Proposition 11**.**

Proposition 12**.**

A.5 Proof of Theorem 7

A.6 Proof of Corollary 8

A.7 Proof of Theorem 9

Appendix B Proofs of Technical Results

B.1 Proof of Lemma 3

Proposition 13**.**

Proposition 14**.**

B.2 Proof of Proposition 5

B.3 Proof of Proposition 10

Definition 13**.**

Lemma 15**.**

B.4 Proof of Proposition 11

B.5 Proof of Proposition 12

Example 1 (Tuition Subsidies).

Example 2 (Ride Sharing).

Example 3 (Congestion Pricing).

Example 4 (Renewable Energy Subsidies).

*Remark 1** (Relationship to Batched Bandits).*

Definition 5 (Regular Allocation Function).

Assumption 1.

Example 6 (Regular Allocation Function Example: Parallel Finite-Capacity Queues).

Assumption 2.

Example 7 (Logistic Choice Function).

Assumption 3.

*Remark 2** (What does the platform know?).*

Definition 8 (Active Supply Size in Equilibrium).

Lemma 1.

Lemma 2.

Lemma 3.

Definition 9 (Marginal Response Function).

Lemma 4.

Proposition 5.

Theorem 6.

*Remark 3** (Population-wide Experimentation & Symmetric Perturbation).*

Theorem 7.

Corollary 8.

Theorem 9.

Example 10 (Risk Aversion).

Example 11 (Supply-Side Surge Pricing).

Example 12 (Vaccine Effectiveness).

Proposition 10.

Proposition 11.

Proposition 12.

Proposition 13.

Proposition 14.

Definition 13.

Lemma 15.