Demand and Welfare Analysis in Discrete Choice Models with Social   Interactions

Debopam Bhattacharya; Pascaline Dupas; Shin Kanaya

arXiv:1905.04028·econ.EM·May 9, 2024

Demand and Welfare Analysis in Discrete Choice Models with Social Interactions

Debopam Bhattacharya, Pascaline Dupas, Shin Kanaya

PDF

Open Access

TL;DR

This paper introduces new empirical tools to analyze demand and welfare effects of policies in binary choice models with social interactions, highlighting the importance of underlying mechanisms and providing bounds on welfare impacts.

Contribution

It connects large game econometrics with social interaction models, develops convergence results, and shows limitations of choice data for welfare analysis despite unique equilibria.

Findings

01

Choice data are insufficient for welfare calculations under social interactions.

02

Distribution-free bounds on welfare can be derived using index restrictions.

03

Experimental data on mosquito-net adoption illustrate the theoretical results.

Abstract

Many real-life settings of consumer-choice involve social interactions, causing targeted policies to have spillover-effects. This paper develops novel empirical tools for analyzing demand and welfare-effects of policy-interventions in binary choice settings with social interactions. Examples include subsidies for health-product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference-parameters under increasing-domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand-prediction under interactions, are insufficient for…

Equations619

A_{v h} = 1 {U_{1} (Y_{v h} - P_{v h}, Π_{v h}, η_{v h}) \geq U_{0} (Y_{v h}, Π_{v h}, η_{v h})},

A_{v h} = 1 {U_{1} (Y_{v h} - P_{v h}, Π_{v h}, η_{v h}) \geq U_{0} (Y_{v h}, Π_{v h}, η_{v h})},

Π_{v h} = \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ I_{v h}],

Π_{v h} = \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ I_{v h}],

U_{1} (Y_{v h} - P_{v h}, Π_{v h}, η_{v h}) = E [u_{1} (Y_{v h} - P_{v h}, \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} A_{v k}, η_{v h}) ∣ I_{v h}],

U_{1} (Y_{v h} - P_{v h}, Π_{v h}, η_{v h}) = E [u_{1} (Y_{v h} - P_{v h}, \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} A_{v k}, η_{v h}) ∣ I_{v h}],

η_{v h} = ξ_{v} + u_{v h},

η_{v h} = ξ_{v} + u_{v h},

I_{v h} = (W_{v h}, L_{v h}, u_{v h}, ξ_{v}) .

I_{v h} = (W_{v h}, L_{v h}, u_{v h}, ξ_{v}) .

u_{v h} = u_{v} (L_{v h}),

u_{v h} = u_{v} (L_{v h}),

E [A_{v k} ∣ I_{v h}] = E [A_{v k} ∣ ξ_{v}],

E [A_{v k} ∣ I_{v h}] = E [A_{v k} ∣ ξ_{v}],

Π_{v h} = \overset{ˉ}{Π}_{v h},

Π_{v h} = \overset{ˉ}{Π}_{v h},

\overset{ˉ}{Π}_{v h} = \overset{ˉ}{Π}_{v h} (ξ_{v}) := \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ ξ_{v}],

\overset{ˉ}{Π}_{v h} = \overset{ˉ}{Π}_{v h} (ξ_{v}) := \frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ ξ_{v}],

\bar{\Pi}_{vh}=\tfrac{1}{N_{v}-1}{\displaystyle\sum\nolimits_{1\leq k\leq N_{v}\text{; }k\neq h}}E_{\boldsymbol{\xi}_{v}}\left[1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},\bar{\Pi}_{vk},\boldsymbol{\eta}_{vk})\\ \geq U_{0}(y,\bar{\Pi}_{vk},\boldsymbol{\eta}_{vk})\end{array}\right\}\right],\text{ \ }h=1,\dots,N_{v},

\bar{\Pi}_{vh}=\tfrac{1}{N_{v}-1}{\displaystyle\sum\nolimits_{1\leq k\leq N_{v}\text{; }k\neq h}}E_{\boldsymbol{\xi}_{v}}\left[1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},\bar{\Pi}_{vk},\boldsymbol{\eta}_{vk})\\ \geq U_{0}(y,\bar{\Pi}_{vk},\boldsymbol{\eta}_{vk})\end{array}\right\}\right],\text{ \ }h=1,\dots,N_{v},

m^{v} (r) = m_{ξ_{v}}^{v} (r) := E_{ξ_{v}} [1 {U_{1} (Y_{v h} - P_{v h}, r, ξ_{v} + u_{v h}) \geq U_{0} (Y_{v h}, r, ξ_{v} + u_{v h})}];

m^{v} (r) = m_{ξ_{v}}^{v} (r) := E_{ξ_{v}} [1 {U_{1} (Y_{v h} - P_{v h}, r, ξ_{v} + u_{v h}) \geq U_{0} (Y_{v h}, r, ξ_{v} + u_{v h})}];

∣ m^{v} (r) - m^{v} (\tilde{r}) ∣ \leq ρ ∣ r - \tilde{r} ∣ for any r, \tilde{r} \in [0, 1] .

∣ m^{v} (r) - m^{v} (\tilde{r}) ∣ \leq ρ ∣ r - \tilde{r} ∣ for any r, \tilde{r} \in [0, 1] .

\overset{ˉ}{Π}_{v h} = \overset{ˉ}{Π}_{v k} for any h, k \in {1, \dots, N_{v}} .

\overset{ˉ}{Π}_{v h} = \overset{ˉ}{Π}_{v k} for any h, k \in {1, \dots, N_{v}} .

Π_{v h} = \overset{π}{ˉ}_{v} for any h = 1, \dots, N_{v},

Π_{v h} = \overset{π}{ˉ}_{v} for any h = 1, \dots, N_{v},

∣ α ∣ e \in R sup f_{ε} (e) < 1,

∣ α ∣ e \in R sup f_{ε} (e) < 1,

Π_{v h} = ψ_{v h} (W_{v h}, L_{v h}, u_{v h}, ξ_{v}),

Π_{v h} = ψ_{v h} (W_{v h}, L_{v h}, u_{v h}, ξ_{v}),

ψ_{v h} (W_{v h}, L_{v h}, u_{v h}, ξ_{v})

ψ_{v h} (W_{v h}, L_{v h}, u_{v h}, ξ_{v})

\displaystyle=\dfrac{1}{N_{v}-1}{\displaystyle\sum\limits_{1\leq k\leq N_{v}\text{; }k\neq h}}E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},\psi_{vk}(W_{vk},L_{vk},\boldsymbol{u}_{vk},\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},\psi_{vk}(W_{vk},L_{vk},\boldsymbol{u}_{vk},\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right],

\overset{ˉ}{ψ}_{v} = Γ_{v, N_{v}} [\overset{ˉ}{ψ}_{v}],

\overset{ˉ}{ψ}_{v} = Γ_{v, N_{v}} [\overset{ˉ}{ψ}_{v}],

Γ_{v, N_{v}} [g] = Γ_{v, N_{v}} [g] (I_{v h})

Γ_{v, N_{v}} [g] = Γ_{v, N_{v}} [g] (I_{v h})

\displaystyle=\dfrac{1}{N_{v}-1}{\displaystyle\sum\limits_{1\leq k\leq N_{v}\text{; }k\neq h}}E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right],

Γ_{v, N_{v}} [g] \to Γ_{v, \infty} [g]

Γ_{v, N_{v}} [g] \to Γ_{v, \infty} [g]

\displaystyle:=\dfrac{1}{N_{v}-1}{\displaystyle\sum\limits_{1\leq k\leq N_{v}\text{; }k\neq h}}E_{\boldsymbol{\xi}_{v}}\left[1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right],

∣ Γ_{v, \infty} [g] - Γ_{v, \infty} [\tilde{g}] ∣ \leq ρ ∣∣ g - \tilde{g} ∣ ∣_{L^{1}} for some ρ \in (0, 1) .

∣ Γ_{v, \infty} [g] - Γ_{v, \infty} [\tilde{g}] ∣ \leq ρ ∣∣ g - \tilde{g} ∣ ∣_{L^{1}} for some ρ \in (0, 1) .

1 \leq h \leq N_{v} sup E [∣ \overset{ˉ}{ψ}_{v} (W_{v h}, L_{v h}, u_{v} (L_{v h})) - \overset{π}{ˉ}_{v} ∣] \to 0 as N_{v} \to \infty .

1 \leq h \leq N_{v} sup E [∣ \overset{ˉ}{ψ}_{v} (W_{v h}, L_{v h}, u_{v} (L_{v h})) - \overset{π}{ˉ}_{v} ∣] \to 0 as N_{v} \to \infty .

\frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ ξ_{v}],

\frac{1}{N _{v} - 1} \sum_{1 \leq k \leq N_{v}; k \neq = h} E [A_{v k} ∣ ξ_{v}],

E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right]

E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right]

(W_{v k}, L_{v k}, u_{v} (L_{v k})) ⊥ W_{v h} ∣ (L_{v h}, u_{v} (L_{v h}), ξ_{v}),

(W_{v k}, L_{v k}, u_{v} (L_{v k})) ⊥ W_{v h} ∣ (L_{v h}, u_{v} (L_{v h}), ξ_{v}),

(W_{v h}, L_{v h}) ⊥ (W_{v k}, L_{v k}) ∣ ({u_{v} (l)}, ξ_{v}) .

(W_{v h}, L_{v h}) ⊥ (W_{v k}, L_{v k}) ∣ ({u_{v} (l)}, ξ_{v}) .

(W_{v k}, L_{v k}, {u_{v} (l)}) ⊥ (W_{v h}, L_{v h}) ∣ ξ_{v}

(W_{v k}, L_{v k}, {u_{v} (l)}) ⊥ (W_{v h}, L_{v h}) ∣ ξ_{v}

Π_{v h} = \overset{ˉ}{ψ}_{v} (L_{v h}, u_{v h}, ξ_{v}) .

Π_{v h} = \overset{ˉ}{ψ}_{v} (L_{v h}, u_{v h}, ξ_{v}) .

\begin{array}[c]{l}U_{1}\left(y-p,\pi,\boldsymbol{\eta}\right)=\delta_{1}+\beta_{1}\left(y-p\right)+\alpha_{1}\pi+\eta^{1}\text{,}\\ U_{0}\left(y,\pi,\boldsymbol{\eta}\right)=\delta_{0}+\beta_{0}y+\alpha_{0}\pi+\eta^{0}\text{,}\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic and Environmental Valuation · Economics of Agriculture and Food Markets · Regional Economics and Spatial Analysis

Full text

Demand and Welfare Analysis in Discrete Choice Models with Social

Interactions††thanks: We are grateful to Steven Durlauf, James Heckman, Xenia Matschke, GautamTripathi, and seminar participtants at the University of Chicago and the University of Luxembourg for helpful feedback. Bhattacharya acknowledges financial support from the ERC consolidator grant EDWEL; the first outline of this project appeared as part b.3 of that research proposal of March 2015. Part of this research was conducted while Kanaya was visiting the Institute of Economic Research, Kyoto University (under the Joint Research Program of the KIER), the support and hospitality of which are gratefully acknowledged.

Debopam Bhattacharya

University of Cambridge Address for correspondence: Faculty of Economics, University of Cambridge, CB3 9DD. Phone (+44)7503858289, email: [email protected]

Pascaline Dupas

Stanford University

Shin Kanaya

University of Aarhus

(26 April 2019.)

Abstract

Many real-life settings of consumer-choice involve social interactions, causing targeted policies to have spillover-effects. This paper develops novel empirical tools for analyzing demand and welfare-effects of policy-interventions in binary choice settings with social interactions. Examples include subsidies for health-product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference-parameters under increasing-domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand-prediction under interactions, are insufficient for welfare-calculations. This is because distinct underlying mechanisms producing the same interaction coefficient can imply different welfare-effects and deadweight-loss from a policy-intervention. Standard index-restrictions imply distribution-free bounds on welfare. We illustrate our results using experimental data on mosquito-net adoption in rural Kenya.

1 Introduction

Social interaction models – where an individual’s payoff from an action depends on the perceived fraction of her peers choosing the same action – feature prominently in economic and sociological research. In this paper, we address a substantively important issue that has received limited attention within these literatures, viz. how to conduct economic policy evaluation in such settings. In particular, we focus on welfare analysis of policy interventions in binary choice scenarios with social interactions. Examples include subsidies for adopting a health-product and merit-based vouchers for attending a high-achieving school, where the welfare gain of beneficiaries may be accompanied by spillover-led welfare effects on those unable to adopt or move, respectively. Ex-ante welfare analysis of policies is ubiquitous in economic applications, and informs the practical decision of whether to implement the policy in question. Furthermore, common public interventions such as taxes and subsidies are often motivated by efficiency losses resulting from externalities. Therefore, it is important to develop empirical methods for welfare analysis in presence of such externalities, which cannot be done using available tools in the literature. Developing such methods and making them practically relevant also requires one to clarify and extend some aspects of existing empirical models of social interaction.

Literature Review and Contributions: Seminal contributions to the econometrics of social interactions include Manski (1993) for continuous outcomes, and Brock and Durlauf (2001a) for binary outcomes. More recently, there has been a surge of research on the related theme of network models, c.f. de Paula (2016). On the other hand, the econometric analysis of welfare in standard discrete choice settings, i.e. with heterogeneous consumers but without social spillover, started with Domencich and McFadden (1977), with later contributions by Daly and Zachary (1978), Small and Rosen (1981), and Bhattacharya (2018). The present paper builds on these two separate literatures to examine how social interactions influence welfare effects of policy-interventions and the identifiability of such welfare effects from standard choice data. In the context of binary choice with social interactions, Brock and Durlauf (2001a, Sec 3.3) discussed how to rank different possible equilibria resulting from policy interventions in terms of social utility – as opposed to individual welfare. They used log-sum type formulae, as in Small and Rosen (1981), to calculate the average indirect utility for specific realized values of covariates and average peer choice. Such calculations are not directly useful for our purpose. This is because the aggregate income transfer that restores average social utility to its pre-intervention level does not equal the average of individual compensating variations that restore individual utilities to their pre-intervention level. The latter is related to the concept of average deadweight loss, i.e. the efficiency cost of interventions, and consequently has received the most attention in the recent literature on empirical welfare analysis, c.f. Hausman and Newey (2016), Bhattacharya (2015), McFadden and Train (2019), and it is this notion of individual welfare that we are interested in. However, in settings involving spillover, we cannot use the methods of the above papers, as they do not allow for individual utilities to be affected by aggregate choices – a feature that has fundamental implications for welfare analysis. Therefore, new methods are required for welfare calculations under spillover, which we develop in the present paper.

In order to develop these methods, one must first have a theoretically coherent utility-based framework where many individuals interact with each other, i.e. provide a micro-foundation for Brock-Durlauf type models in terms of an empirical game with many players. This is necessary because welfare effects are defined with respect to utilities, and therefore, one has to specify the structure of individual preferences and beliefs including unobserved heterogeneity, and how they interact to produce the aggregate choice in equilibrium before and after the policy intervention. This requires clarifying the information structure and nature of the corresponding Bayes-Nash equilibria. A pertinent issue here is modelling the dependence structure of utility-relevant variables unobservable to the analyst but observable to the individual players. In particular, spatial correlation in unobservables – natural in the commonly analyzed setting where peer-groups are physical neighborhoods – makes individual beliefs conditional on one’s own privately observed variables which contain information about neighborhood ones. This complicates identification and inference. The first main contribution of the present paper is to establish conditions under which this feature of beliefs can be ignored ‘in the limit’, and one can proceed as if one is in an I.I.D. setting. This derivation is much more involved than the well-known result that in linear regression models, the OLS is consistent under correlated unobservables. In particular, our result involves showing that the fixed points of certain functional maps converge, under increasing domain and weak dependence asymptotics for spatial data, to fixed points of a limiting map, implying convergence of conditional beliefs to unconditional ones. This, in turn, is shown to imply convergence of complicated estimators of preference parameters under conditional beliefs to computationally simple ones in the limit. These estimators then yield consistent, counterfactual demand-prediction corresponding to a policy-intervention.

The standard setting in the game estimation literature is one where many independent markets are observed, each with a small number of players. Here, we consider estimation of preference parameters from data on a few markets with many players in each, using asymptotic approximations where the number of players tends to infinity but number of markets remains fixed. In this setting, if the forms of equilibrium beliefs is symmetric among players,111Symmetry means that (1) if the beliefs are unconditional expectations – as is the case with I.I.D. unobervables – they are identical across players, (2) if they are conditional expectations – as is the case for spatially correlated unobservables – their functional forms are identical. the probabilistic laws that they follow have a certain homogeneity across players. Due to this homogeneity, asymptotics on the number of players provides the ‘repeated observations’ required to identify the players’ preference parameters. Menzel (2016) had also analyzed identification and estimation in games with many players. Below, we provide more discussion on the relation and differences between our analysis and Menzel’s.

Welfare Analysis: The second part of our paper concerns welfare-analysis of policy-interventions, e.g. a price-subsidy, in a setting with social interactions. Here we show that unlike counterfactual demand estimation, welfare effects are generically not identified from choice data under interactions, even when utilities and the distribution of unobserved heterogeneity are parametrically specified, equilibrium is unique, and there are no endogeneity concerns. To understand the heuristics behind under-identification, consider the empirical example of evaluating the welfare effect of subsidizing an anti-malarial mosquito net. Suppose, under suitable restrictions, we can model choice behavior in this setting via a Brock-Durlauf type social interaction model, and the data can identify the coefficient on the social interaction term. However, this coefficient may reflect an aggregate effect of (at least) two distinct mechanisms, viz. (a) a social preference for conforming, and (b) a health-concern led desire to protect oneself from mosquitoes deflected from neighbors who adopt a bednet. These two distinct mechanisms, with different magnitudes in general, would both make the social interaction coefficient positive, and are not separately identifiable from choice data (only their sum is). But they have different implications for welfare if, say, a subsidy is introduced. At one extreme, if all spillover is due to preference for social conforming, then as more neighbours buy, a household that buys would experience an additional rise in utility (over and above the gain due to price reduction), but a non-buyer loses no utility via the health channel. At the other extreme, if spillover is solely due to perceived negative health externality of buyers on non-buyers, then increased purchase by neighbours would lower the utility of a household upon not buying via the health-route, but not affect it upon buying since the household is then protected anyway. These different aggregate welfare effects are both consistent with the same positive aggregate social interaction coefficient. This conclusion continues to hold even if eligibility for the subsidy is universal, there are no income effects or endogeneity concerns, and whether or not unobservables in individual preferences are I.I.D. or spatially correlated.

Indeed, this feature is present in many other choice situations that economists routinely study. For example, consider school-choice in a neighborhood with a free, resource-poor local school and a selective, fee-paying resource-rich school. In this setting, a merit-based voucher scheme for attending the high-achieving school can potentially have a range of possible welfare effects. Aggregate welfare change could be negative if, for example, with high-ability children moving with the voucher the academic quality declines in the resource-poor school more than the improvement in the selective school via peer-effects. In the absence of such negative externalities, aggregate welfare could be positive due to the subsidy-led price decline for voucher users and any positive conforming effects that raise the utility of attending the rich school when more children also do so. These contradictory welfare implications is compatible with the same positive coefficient on the social interaction term in an individual school-choice model.

For standard discrete choice without spillover, Bhattacharya (2015) showed that the choice probability function itself contains all the information required for exact welfare analysis. In particular, for the special case of quasi-linear random utility models with extreme value additive errors, the popular ‘logsum’ formula of Small and Rosen (1981) yields average welfare of policy interventions. These results fail to hold in a setting with spillovers because here one cannot set the utility from the outside option to zero – an innocuous normalization in standard discrete choice models – since this utility changes as the equilibrium choice-rate changes with the policy-intervention. This is in contrast to binary choice without spillover, where utility from the outside option, i.e. non-purchase, does not change due to a price change of the inside good.

Nonetheless, under a standard, linear-index specification of demand, one can calculate distribution-free bounds on average welfare, based solely on choice probability functions. The width of the bounds increases with (i) the extent of net social spillover, i.e. how much the (belief about) average neighborhood choice affects individual choice probabilities, and (ii) the difference in average peer-choice corresponding to realized equilibria before and after the price-change. The index structure, which has been universally used in the empirical literature on social interactions (c.f. Brock and Durlauf, 2001a, 2007), leads to dimension reduction that plays an important role in identifying spillover effects. We therefore continue to use the index structure as it simplifies our expressions, and comes “for free”, because social spillovers cannot in general be identified without such structure anyway. Under stronger and untestable restrictions on the nature of spillover, our bounds can shrink to a singleton, implying point-identification of welfare. Two such restrictions are (a) the effects of an increase in average peer-choice on individual utilities from buying and not buying are exactly equal in magnitude and opposite in sign, or (b) the effect of aggregate choice on either the purchase utility or the non-purchase utility is zero.

Empirical Illustration: We illustrate our theoretical results with an empirical example of a hypothetical, targeted public subsidy scheme for anti-malarial bednets. In particular, we use micro-data from a pricing experiment in rural Kenya (Dupas, 2014) to estimate an econometric model of demand for bednets, where spillover can arise via different channels, including a preference for conformity and perceived negative externality arising from neighbors’ use of a bednet. In this setting, we calculate predicted effects of hypothetical income-contingent subsidies on bednet demand and welfare. We perform these calculations by first accounting for social interactions, and then compare these results with what would be obtained if one had ignored these interactions. We find that allowing for (positive) interaction leads to a prediction of lower demand when means-tested eligibility is restricted to fewer households and higher demand when the eligibility criterion is more lenient, relative to ignoring interactions. The intuitive explanation is that ignoring a covariate with positive impact on the outcome would lead to under-prediction if the prediction point for the ignored covariate is higher than its mean value. As for welfare, allowing for social interactions may lead to a welfare loss for ineligible households, in turn implying higher deadweight loss from the subsidy scheme, relative to estimates obtained ignoring social spillover where welfare effects for ineligibles are zero by definition. The resulting net welfare effect, aggregated over both eligibles and ineligibles, admits a large range of possible values including both positive and negative ones, with associated large variation in the implied deadweight loss estimates, all of which are consistent with the same coefficient on the social interaction term in the choice probability function.

An implication of these results for applied work is that welfare analysis under spillover effects requires knowledge of the different channels of spillover separately, possibly via conducting a ‘belief-elicitation’ survey; knowledge of only the choice probability functions, inclusive of a social interaction term, is insufficient.

Plan of the Paper: The rest of the paper is organized as follows. Section 2 describes the set-up, and establishes the formal connection between econometric analysis of large games and Brock-Durlauf type social interaction models for discrete choice, first under I.I.D. and then under spatially correlated unobservables. This section contains the key results on convergence of conditional (on unobservables) beliefs in the spatial case to non-stochastic ones under an increasing domain asymptotics. Section 3 shows consistency of our preferred, computationally simple estimator even under spatial dependence, Section 4 develops the tools for empirical welfare analysis of a price intervention – such as a means-tested subsidy – in such models, and associated deadweight loss calculations. In Section 5, we lay out the context of our empirical application, and in Section 6 we describe the empirical results obtained by applying the theory to the data. Finally, Section 7 summarizes and concludes the paper. Technical derivations, formal proofs and additional results are collected in an Appendix.

2 Set-up and Assumptions

Consider a population of villages indexed by $v\in\left\{1,\dots,\bar{v}\right\}$ and resident households in village $v$ indexed by $\left(v,h\right)$ , with $h\in\left\{1,\dots,N_{v}\right\}$ . For the purpose of inference discussed later, we will think of these households as a random sample drawn from an infinite superpopulation. The total number of households we observe is $N=\sum_{v=1}^{\bar{v}}N_{v}$ . Each household faces a binary choice between buying one unit of an indivisible good (alternative 1) or not buying it (alternative 0). Its utilities from the two choices are given by $U_{1}(Y_{vh}-P_{vh},\Pi_{vh},\boldsymbol{\eta}_{vh})$ and $U_{0}(Y_{vh},\Pi_{vh},\boldsymbol{\eta}_{vh})$ where the variables $Y_{vh}$ , $P_{vh}$ , and $\boldsymbol{\eta}_{vh}$ denote respectively the income, price, and heterogeneity of household $(v,h)$ , and $\Pi_{vh}$ is household $\left(v,h\right)$ ’s subjective belief of what fraction of households in her village would choose alternative $1$ . The variable $\boldsymbol{\eta}_{vh}$ is privately observed by household $\left(v,h\right)$ but is unobserved by the econometrician and other households. The dependence of utilities on $\Pi_{vh}$ captures social interactions. Below, we will specify how $\Pi_{vh}$ is formed. Household $(v,h)$ ’s choice is described by

[TABLE]

where $1\left\{\cdot\right\}$ denotes the indicator function. In the mosquito-net example of our application, one can interpret $U_{1}$ and $U_{0}$ as expected utilities resulting from differential probabilities of contracting malaria from using and not using the net, respectively.

The utilities, $U_{1}$ and $U_{0}$ , may also depend on other covariates of $(v,h)$ . For notational simplicity, we let $W_{vh}=(Y_{vh},P_{vh})^{\prime}$ , and suppress other covariates for now; covariates are considered in our empirical implementation in Section 6.

For later use, we also introduce a set of location variables $\left\{L_{vh}\right\}$ : where $L_{vh}\in\mathbb{R}^{2}$ denotes $(v,h)$ ’s (GPS) location.

**Incomplete-Information Setting: **In each village $v$ , each of the $N_{v}$ households is provided the opportunity to buy the product at a researcher-specified price $P_{vh}$ randomly varied across households. These households will be termed as players from now on. Players have incomplete information in that each player $(v,h)$ knows her own variables $(A_{vh},W_{vh},L_{vh},\boldsymbol{\eta}_{vh})$ . We assume, in line with our application context, that a player does not know the identities of all the players who have been selected in the experiment and thus their variables $(W_{\tilde{v}k},L_{\tilde{v}k},\boldsymbol{\eta}_{\tilde{v}k})$ and choice $A_{\tilde{v}k}$ (for any $\tilde{v}\in\left\{1,\dots,\bar{v}\right\}$ and $k\neq h$ ). Accordingly, we model interactions of households as an incomplete-information Bayesian game, whose probabilistic structure is as follows.

We consider two sources of randomness: one stemming from random drawing of households from a superpopulation, and the other associated with the realization of players’ unobserved heterogeneity $\left\{\boldsymbol{\eta}_{vh}\right\}$ . This will be further elaborated below.

We assume players have ‘rational expectations’ in accordance with the standard Bayes-Nash setting, i.e., each $(v,h)$ ’s belief is formed as

[TABLE]

where $E\left[\cdot\text{ }|\mathcal{I}_{vh}\right]$ is the conditional expectation computed through the probability law that governs all the relevant variables given $(v,h)$ ’s information set $\mathcal{I}_{vh}$ that includes $(W_{vh},\boldsymbol{\eta}_{vh})$ . Here, ‘rational expectation’ simply means that subjective and physical laws of all relevant variables coincide. The explicit form of (2) in equilibrium is investigated in the next subsection after we have specified the probabilistic structure for all the variables.

Each player $(v,h)$ is solely concerned with behavior of other players in the same village. In this sense, the econometrician observes $\bar{v}$ games ( $\bar{v}$ is eleven in our empirical study), each with ‘many’ players. To formalize our model as a Bayesian game in each village, given the form of (2), $U_{1}$ and $U_{0}$ would be interpreted as expected utilities. This is possible when the underlying vNM utility indices $u_{1}$ and $u_{0}$ satisfy

[TABLE]

i.e., $u_{1}$ is linear in the second argument; $U_{0}$ and $u_{0}$ satisfy an analogous relationship. This will hold in particular when utilities have a linear index structure, as in Manski (1993) and Brock and Durlauf (2001a, 2007).

**Dependence Structure of Unobserved Heterogeneity: **We assume that unobserved heterogeneity $\{\boldsymbol{\eta}_{vh}\}_{v=1}^{N_{v}}$ ( $v=1,\dots\bar{v}$ ) takes the following form:

[TABLE]

where $\boldsymbol{\xi}_{v}$ stands for a village-specific factor that is common to all members in the $v$ th village and $\boldsymbol{u}_{vh}$ represents an individual specific variable. Below we will consider two different specifications for the sequence $\{\boldsymbol{u}_{vh}\}_{h=1}^{N_{v}}$ : for each $v$ , given $\boldsymbol{\xi}_{v}$ , viz., (1) $\boldsymbol{u}_{vh}$ are conditionally independent and identically distributed, and (2) $\boldsymbol{u}_{vh}$ is spatially dependent.222The “fixed-effect” type specification (3) is similar to Brock and Durlauf (2007). However, the additive separable structure of (3) is assumed here for expositional simplicity; we can allow for $\boldsymbol{\eta}_{vh}=\boldsymbol{\bar{\eta}}\mathbf{(}\boldsymbol{\xi}\mathbf{{}_{v},}\boldsymbol{u}\mathbf{{}_{vh})}$ for some possibly nonlinear function $\boldsymbol{\bar{\eta}}\left(\cdot,\cdot\right)$ , and this general form does not change anything substantive in what follows. We assume that the value of $\boldsymbol{\xi}_{v}$ is commonly known to all members in village $v$ but $\boldsymbol{u}_{vh}$ is a purely private variable known only to individual $(v,h)$ . Neither $\left\{\boldsymbol{\xi}_{v}\right\}$ nor $\left\{\boldsymbol{u}_{vh}\right\}$ is observable to the econometrician. We also assume that this information structure as well as the probabilistic structure of variables imposed below (c.f. conditions C1, C2, and C3 with I.I.D. or SD below) is known to all the players in the game. Given our settings so far, we can specify the form of player $(v,h)$ ’s information set as

[TABLE]

In our empirical set-up, the group level unobservables $\{\boldsymbol{\xi}_{v}\}$ will be identified using the fact that there are many households per village.

Having described the set-up through equations: (1), (2), (3), and (4), we now close our model by providing the following conditions on the probabilistic law for the key variables:

C1

$\{(W_{vh},L_{vh},\boldsymbol{\xi}_{v},\boldsymbol{u}_{vh})\}_{h=1}^{N_{v}}$ , $v=1,\dots,\bar{v}$ , are independent across $v$ .

Assumption C1 says that variables in village $v$ are independent of those in village $\tilde{v}(\neq v)$ .

C2

For each $v\in\left\{1,\dots,\bar{v}\right\}$ , given $\boldsymbol{\xi}_{v}$ , $\{(W_{vh},L_{vh})\}_{h=1}^{N_{v}}$ is I.I.D. with $(W_{vh},L_{vh})$ $\sim$ $F_{WL}^{v}(w,l|\boldsymbol{\xi}_{v})$ , the conditional CDF for village $v$ .

This conditional I.I.D.-ness of C2 for observables represents randomness associated with sampling of households in our field experiment. Additionally, the household $(v,h)$ is assumed to know the distribution $F_{WL}^{v}(w,l|\boldsymbol{\xi}_{v})$ .

For the distribution of unobservable heterogeneity, we consider two alternative scenarios:

C3-IID

(i) For each $v$ , given $\boldsymbol{\xi}_{v}$ , the sequence $\left\{\boldsymbol{u}_{vh}\right\}_{h=1}^{N_{v}}$ is conditionally I.I.D., with $\boldsymbol{u}_{vh}|\boldsymbol{\xi}_{v}$ $\sim$ $F_{\boldsymbol{u}}^{v}(\cdot|\boldsymbol{\xi}_{v})$ ; (ii) $\left\{\boldsymbol{u}_{vh}\right\}_{h=1}^{N_{v}}$ is independent of $\left\{W_{vh},L_{vh}\right\}_{h=1}^{N_{v}}$ conditionally on $\boldsymbol{\xi}_{v}$ .

C3-SD

For each $v$ , the sequence $\left\{\boldsymbol{u}_{vh}\right\}$ defined as

[TABLE]

for a stochastic process $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}_{l\in\mathcal{L}_{v}}$ , indexed by location $l\in\mathcal{R}_{v}$ $\sqsubset$ $\mathbb{R}^{2}$ , where $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}_{l\in\mathcal{R}_{v}}$ are independent of $\left\{\boldsymbol{u}_{v^{\prime}}\left(l\right)\right\}_{l\in\mathcal{R}_{v^{\prime}}}$ for $v\neq v^{\prime}$ , and satisfy the following properties: (i) for each $v$ , $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}_{l\in\mathcal{R}_{v}}$ is an alpha-mixing stochastic process conditionally on $\boldsymbol{\xi}_{v}$ , where the definition of an alpha-mixing process is provided in Appendix A.2; (ii) $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}_{l\in\mathcal{R}_{v}}$ is independent of $\{(W_{vh},L_{vh})\}_{h=1}^{N_{v}}$ conditionally on $\boldsymbol{\xi}_{v}$ .

The conditional I.I.D.-ness imposed in C3-IID (i) leads to equi-dependence within each village, i.e., $\mathrm{Cov}\left[\boldsymbol{\eta}_{vh},\boldsymbol{\eta}_{vk}\right]=\mathrm{Cov}\left[\boldsymbol{\eta}_{v\tilde{h}},\boldsymbol{\eta}_{v\tilde{k}}\right](\neq 0)$ for any $h\neq k$ and $\tilde{h}\neq\tilde{k}$ . In contrast, C3-SD (i) allows for non-uniform dependence that may vary depending on the relative locations of the two players, i.e., if two households $(v,h)$ and $\left(v,k\right)$ selected in the experiment with locations $L_{vh}$ and $L_{vk}$ , respectively, live close to each other (i.e., $||L_{vh}-L_{vk}||$ is small), $\boldsymbol{u}_{vh}$ and $\boldsymbol{u}_{vk}$ (and thus $\boldsymbol{\eta}_{vh}$ and $\boldsymbol{\eta}_{vk}$ ) are more correlated. For example, in our application on mosquito-net adoption, this can correspond to positive spatial correlation in density of mosquitoes, unobserved by the researcher. Assumption C3-SD is consistent with the “increasing domain” type asymptotic framework used for spatial data, formally set out in Appendix A.2 of this paper (briefly, the area of $\mathcal{R}_{v}=\mathcal{R}_{v}^{N}$ tends to $\infty$ as $N\rightarrow\infty$ ; c.f. Lahiri, 2003, Lahiri and Zhu, 2006).

For the purpose of inference, C3-SD may be seen as a generalization of C3-IID, but in our Bayes-Nash framework with many players, they will, in general, imply substantively different forms for beliefs and equilibria. In particular, under C3-IID, each player $(v,h)$ ’s unobservables $u_{vh}$ is not useful for predicting another player $(v,k)$ ’s variables and behavior, and therefore her belief $\Pi_{vh}$ – defined in (2) as the average of the conditional expectations about all the others’ $A_{vk}$ – is reduced to the average of the *unconditional *expectations (as formally shown in Proposition 1) below. On the other hand, under the spatial dependence scheme C3-SD, since $\boldsymbol{u}_{vh}$ and $\boldsymbol{u}_{vk}$ are correlated, knowing one’s own realized value of $\boldsymbol{u}_{vh}$ can help predict others’ $\boldsymbol{u}_{vk}$ ; in other words, $(v,h)$ ’s own information $\mathcal{I}_{vh}=(W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ is useful for forming beliefs about others.

Condition (ii) in C3 (with I.I.D. or SD) is the exogeneity condition. Since $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}$ is independent of $(W_{vh},L_{vh})$ conditionally on $\boldsymbol{\xi}_{v}$ , we have $W_{vh}\perp\boldsymbol{u}_{v}(L_{vh})|L_{vh},\boldsymbol{\xi}_{v}$ . This allows for identification and consistent estimation of model parameters. In the context of the field experiment in our empirical exercise, this exogeneity condition can be interpreted as saying that realization of unobserved heterogeneity is independent of how researchers have selected the sample. Note that the exogeneity condition is conditional on $L_{vh}$ (and $\boldsymbol{\xi}_{v}$ ), and it does not exclude correlation of $\boldsymbol{u}_{vh}$ and $W_{vh}\equiv\left(P_{vh},Y_{vh}\right)$ in the unconditional sense. Say, if $Y_{vh}$ is well predicted by location $L_{vh}$ (say, there are high-income districts and low-income ones, and no restriction is imposed on the joint distribution of $\left(W_{vh},L_{vh}\right)$ ), we can still capture situations where $\boldsymbol{u}_{vh}$ tends to be higher for $(v,h)$ ’s income $Y_{vh}$ since $\boldsymbol{u}_{vh}=\boldsymbol{u}_{v}(L_{vh})$ .333In our application, prices $P_{vh}$ are randomly assigned to individuals by researchers and thus $P_{vh}$ and $\boldsymbol{u}_{vh}$ are independent both unconditionally and conditionally on $L_{vh}$ .

**Two Sources of Randomness: **The above probabilistic framework with two sources of randomness has parallels in Andrews (2005, Section 7) and Lahiri and Zhu (2006). It is also related to Menzel’s (2016) framework with exchangeable variables (below we provide further comparison of our framework with Menzel’s). As stated, C2 represents randomness induced by the researchers’ experimental process. In contrast, the specification in C3 represents randomness of unobserved heterogeneity conditionally on $\left\{L_{vh}\right\}_{h=1}^{N_{v}}$ , the (locations of) households selected in the experiment.

Conditions C2 and C3-IID imply that $\{(W_{vh},L_{vh},\boldsymbol{u}_{vh})\}_{h=1}^{N_{v}}$ are I.I.D. conditionally on $\boldsymbol{\xi}_{v}$ , and thus our framework can be interpreted as the standard one with a single source of randomness. For the spatial case C3-SD, the beliefs depend on $\mathcal{I}_{vh}$ , and in particular, on the unobservable (to the econometrician) $\boldsymbol{u}_{vh}$ , which complicates identification and inference. We get around this complication by showing that under an “increasing domain” type of asymptotics for spatial data, reasonable in our application, the model and estimates of its parameters under C3-SD converge essentially to the simpler model C3-IID, and this justifies the use of Brock-Durlauf type analysis even under spatial dependence.

2.1 Equilibrium Beliefs

In this subsection, we investigate the forms of players’ beliefs defined in (2) first in the I.I.D. and then in the spatially dependent case. We first consider the case of C3-IID. This case corresponds to Brock and Durlauf’s (2001a) binary choice model with social interactions where, additionally, unobserved heterogeneity was modelled through the logistic distribution. BD01 made an intuitive, but somewhat ad hoc, assumption that beliefs, corresponding to our $\Pi_{vh}$ , are constant and symmetric across all players in the same village. We first show that under C3-IID, this assumption can be justified in our incomplete-information game setting via the specification of a Bayes-Nash equilibrium. We next consider the spatially dependent case with C3-SD. As briefly discussed above, beliefs under the spatial dependence have to be computed through conditional expectations. However, under an “increasing domain” asymptotic framework for spatial data, conditional-expectation based beliefs converge to the beliefs in the I.I.D. case. The mathematical derivation of this result is somewhat involved; so in the main text we outline the key points, and provide the formal derivation in the Appendix.

2.1.1 Constant and Symmetric Beliefs under the (Conditional) I.I.D.

Setting

We investigate the forms of beliefs under C3-IID through the two following propositions:

Proposition 1

Suppose that Conditions C1, C2, and C3-IID are common knowledge in the Bayesian game described in the previous section. Then, for any $k\neq h$ in village $v$ with $\boldsymbol{\xi}_{v}$ ,

[TABLE]

where $\mathcal{I}_{vh}=(W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ defined in (4).

The proof of Proposition 1 is provided in Appendix A.1. Note that this proposition does not utilize any equilibrium condition. It simply confirms, formally, the intuitive statement that $(v,h)$ ’s own variables are not useful to predict other $(v,k)$ ’s behavior $A_{vk}$ . Given this result, we can write the belief $\Pi_{vh}$ (defined in (2)) as

[TABLE]

where

[TABLE]

and $\bar{\Pi}_{vh}$ is a function of $\boldsymbol{\xi}_{v}$ and independent of $(v,h)$ -specific variables, $(W_{vh},L_{vh},\boldsymbol{u}_{vh})$ , while the functional form of $\bar{\Pi}_{vh}$ may depend on the index $\left(v,h\right)$ in a deterministic way; for notational simplicity, we suppress the dependence of $\bar{\Pi}_{vh}$ on $\boldsymbol{\xi}_{v}$ below.

Beliefs in equilibrium solve the system of $N_{v}$ equations:

[TABLE]

where $E_{\boldsymbol{\xi}_{v}}\left[\cdot\right]$ denotes the conditional expectation operator given $\boldsymbol{\xi}_{v}$ (i.e., $E\left[\cdot|\boldsymbol{\xi}_{v}\right]$ ). Brock and Durlauf (2001a) focus on equilibria with constant and symmetric beliefs.444The constancy of beliefs means that each player’s belief is independent of any realization of her own, player-specific variables as in (77). Using our notation above, we say that (constant) beliefs are symmetric when $\bar{\Pi}_{vh}=\bar{\Pi}_{vk}$ for any $h,k\in\{1,\dots,N_{v}\}$ (for each $v$ ). When Brock and Durlauf’s framework is interpreted as a Bayesian game, one can formally justify their focus on constant and symmetric beliefs under conditions laid out in Proposition 2 below.

To establish this proposition, define for each $v$ , given $\boldsymbol{\xi}_{v}$ , a function $m^{v}:\left[0,1\right]\rightarrow[0,1]$ as

[TABLE]

for notational economy, we will often suppress the dependence of $m^{v}\left(r\right)$ on $\boldsymbol{\xi}_{v}$ ; but note that $m^{v}\left(r\right)$ is independent of individual index $h$ under the conditional I.I.D. assumption given $\boldsymbol{\xi}_{v}$ . Now we are ready to provide the following characterization of beliefs:

Proposition 2

Suppose that the same conditions hold as in Proposition 1 and the function $m_{\boldsymbol{\xi}_{v}}^{v}\left(\cdot\right)$ defined in (8) is a contraction, i.e., for some $\rho\in\left(0,1\right)$ ,

[TABLE]

Then, a solution $(\bar{\Pi}_{v1},\dots,\bar{\Pi}_{vN_{v}})$ of the system of $N_{v}$ equations in (7) uniquely exists and is given by symmetric beliefs, i.e.,

[TABLE]

The proof is given in the Appendix. Propositions 1-2 show that, given the (conditional) I.I.D. and contraction conditions, the equilibrium is characterized through

[TABLE]

for some constant $\bar{\pi}_{v}:=\bar{\pi}_{v}(\boldsymbol{\xi}_{v})\in\left[0,1\right]$ within each village (given $\boldsymbol{\xi}_{v}$ ). This implies that the beliefs can be consistently estimated by the sample average of $A_{vk}$ over village $v$ , which is exploited in our empirical study.

The contraction condition (9) can be verified on a case by case basis. In particular, for the linear index model used below, the condition is

[TABLE]

where $\alpha$ denotes the coefficient on beliefs, i.e. the social interaction term, and $f_{\varepsilon}\left(\cdot\right)$ denotes the density of $\varepsilon$ , the unobservable determinant of choosing option $1$ (defined below through $\boldsymbol{\eta}_{vk}$ or $\boldsymbol{u}_{vh}$ ). In a probit specification in which $\varepsilon$ is the standard normal, $\sup_{e\in\mathbb{R}}f_{\varepsilon}\left(e\right)=1/\sqrt{2\pi}$ and thus we require $\left|\alpha\right|<\sqrt{2\pi}(\simeq 2.506)$ and for the logit specification, $\sup_{e\in\mathbb{R}}f_{\varepsilon}\left(e\right)=1/4$ , and thus $\left|\alpha\right|<4$ . We verify that these conditions are satisfied in our application.

Note, however, from the proof of Proposition 2, that the contraction condition (9) is not necessarily required for uniqueness. That is, if a solution $(\bar{\Pi}_{v1},\dots,\bar{\Pi}_{vN_{v}})$ to the system of equations (7) is unique and $m^{v}\left(\cdot\right)$ defined in (8) has a unique fixed point (i.e., a solution to $r=m^{v}\left(r\right)$ is unique), then the same conclusion still holds. We have imposed (9) since it is a convenient sufficient condition that guarantees uniqueness both in (7) and $r=m^{v}\left(r\right)$ ; it also appears to be a mild condition, and easy to verify in applications.

2.1.2 Convergence of Beliefs under Spatial Dependence

In this subsection, we provide a formal characterization of beliefs in equilibrium under the spatial case C3-SD. When the unobserved heterogeneity $\{\boldsymbol{u}_{vh}\}$ are dependent, beliefs in equilibrium may not reduce to a constant within each village, unlike in Proposition 1. With correlated $\boldsymbol{u}_{vk}$ and $\boldsymbol{u}_{vh}$ , the conditional expectation $E[A_{vk}|\mathcal{I}_{vh}]$ is in general a function of the privately observed $\boldsymbol{u}_{vh}$ , because knowing $\boldsymbol{u}_{vh}$ is useful for predicting $\boldsymbol{u}_{vk}$ and thus $A_{vk}$ (the latter is a function of $\boldsymbol{u}_{vk}$ ). While $(v,h)$ ’s beliefs are given by a constant under C3-IID, they will in general be a function of $(v,h)$ ’s variables unobserved by the researcher, when spatial dependence is allowed, thereby complicating the analysis. In this subsection, we investigate formal conditions under which this feature of beliefs disappears “in the limit”.555Yang and Lee, 2017 discuss estimation of a social interaction model with heterogeneous beliefs, but the heterogeneity is solely a function of observed player-specific variables (c.f. Eqn 2.1 in Yang and Lee, 2017), while unobserved private variables are IID, and not spatially correlated as in our case.

**Asymptotic Framework for Spatial Data: **Under spatial dependence, the first key condition enabling consistent estimation of our model parameters is the spatial analog of weak dependence. This amounts to specifying that $\boldsymbol{u}_{vk}$ and $\boldsymbol{u}_{vh}$ are less dependent when the distance between $\left(v,k\right)$ and $(v,h)$ , $||L_{vk}-L_{vh}||_{1}$ , is large. The notion of asymptotics we use is the so-called “increasing domain” type (c.f. Lahiri, 1996), where the area from which $\left\{L_{vk}\right\}_{k=1}^{N_{v}}$ is sampled expands to infinity as $N_{v}\rightarrow\infty$ . In particular, for each player $h$ , the number of other players who are almost uncorrelated with $h$ expands to $\infty$ , and the ratio of such players (relative to all $N_{v}$ players) tends to $1$ . Given this, and assuming that any bounded region in the support of $L_{vk}$ does not contain too many observations (even when $N_{v}$ tends to $\infty$ ), we can (i) ignore the effect of spatial dependence on equilibrium beliefs “in the limit”, and (ii) derive limit results for spatial data (e.g., the laws of large numbers and central limit theorems as in Lahiri, 1996, 2003), and use these to develop an asymptotic inference procedure.

In our empirical set-up, the average distance between households within every village is more than 1 kilometer, and is close to 2 kilometers in most villages. This corresponds well with the increasing domain framework above.

Convergence of Equilibrium Belief: We now characterize the game’s equilibrium under the asymptotic scheme outlined above. The formal details of the analysis are laid out in Appendix A.2; here we outline the main substantive features and their implications for the belief structure.

To characterize beliefs in equilibrium, write

[TABLE]

given each $\boldsymbol{\xi}_{v}$ . $\psi_{vh}(\cdot)$ may depend on index $(v,h)$ in a deterministic way. Note that this expression (10) follows from the specification of $\Pi_{vh}$ in (2), defined as the average of the conditional expectations. Then, in the equilibrium, for each village $v$ , beliefs are given by the set of functions, $\psi_{vh}(\cdot)$ , $h=1,\dots N_{v}$ , that solves the following system of $N_{v}$ equations:

[TABLE]

for $h=1,\dots,N_{v}$ (almost surely).

Note that the solution $\left\{\psi_{vh}(\cdot)\right\}$ to (13) depends on $N_{v}$ , the number of households. We now discuss the limit of the solutions when $N_{v}\rightarrow\infty$ . To this end, for expositional ease, consider a symmetric equilibrium such that $\psi_{vh}(\cdot)=\bar{\psi}_{v}(\cdot)$ for any $h=1,\dots,N_{v}$ ; symmetry is imposed here solely for easy exposition, and *a formal proof without symmetry is provided in Appendix *A.2. Under symmetry, the functional equation in (13) is reduced to

[TABLE]

where $\Gamma_{v,N_{v}}$ is a functional operator (mapping) from a $\left[0,1\right]$ -valued function $g$ (of random variables, $\mathcal{I}_{vk}=(W_{vk},L_{vk},\boldsymbol{u}_{vk},\boldsymbol{\xi}_{v})$ ) to another $\left[0,1\right]$ -valued function $\Gamma_{v,N_{v}}\left[g\right]\mathcal{\ }$ (evaluated at $\mathcal{I}_{vh}$ ):

[TABLE]

where $\boldsymbol{u}_{vh}=\boldsymbol{u}_{v}(L_{vk})$ as formulated in C3-SD. Under C3-IID in (7), we have considered the system of equations that can be eventually defined through the unconditional expectations $E_{\boldsymbol{\xi}_{v}}\left[\cdot\right]$ . In contrast, here we have to consider conditional expectations of the form $E\left[\left.\cdot\right|\mathcal{I}_{vh}\right]=E\left[\left.\cdot\right|W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v}\right]$ , as in (13) and (17). Given the correlation in $\left\{\boldsymbol{u}_{vh}\right\}$ , they do not reduce to the unconditional ones since $\boldsymbol{u}_{vh}$ is useful for predicting others’ $\boldsymbol{u}_{vk}$ . However, under the increasing domain asymptotics and a weak dependence condition (i.e., $\boldsymbol{u}_{v}\left(L_{vk}\right)$ and $\boldsymbol{u}_{v}\left(L_{vh}\right)$ are less correlated when $||L_{vk}-L_{vh}||_{1}$ is large), both of which are standard asymptotic assumptions for inference with spatial data, the number of players in the game whose unobservables are almost uncorrelated with any given player $(v,h)$ becomes large as $N_{v}\rightarrow\infty$ , and further the ratio of such players (among all $N_{v}$ players) tends to $1$ . As a result, the operator $\Gamma_{v,N_{v}}\left[g\right]$ converges to the average of the unconditional expectations:

[TABLE]

for any $g$ , where we call each summand $E_{\boldsymbol{\xi}_{v}}[\cdot]$ an ‘unconditional’ expectation in that it is independent of $(W_{vh},L_{vh},\boldsymbol{u}_{vh})$ , and we also suppress the dependence of $\Gamma_{v,\infty}$ on $\boldsymbol{\xi}_{v}$ for notational simplicity.666We write $E\left[B|\boldsymbol{\xi}_{v}\right]=E_{\boldsymbol{\xi}_{v}}\left[B\right]$ and $E\left[B|C,\boldsymbol{\xi}_{v}\right]=E_{\boldsymbol{\xi}_{v}}\left[B|C\right]$ for any random objects $B$ and $C$ . The precise meaning of this convergence, together with required conditions, is formally stated in the Appendix (see (93) in the proof of Theorem 5, for the general case without symmetry).

The convergence of the operator $\Gamma_{v,N_{v}}$ to $\Gamma_{v,\infty}$ caries over to that of a fixed point of $\Gamma_{v,N_{v}}$ (i.e. the solution of $\bar{\psi}_{v}=\Gamma_{v,N_{v}}\left[\bar{\psi}_{v}\right]$ ) when the limit operator $\Gamma_{v,\infty}$ is a contraction. The above discussion can be summarized as:

Theorem 1

Suppose that C2 and C3-SD hold with Assumption 4 (introduced in Appendix A.2), and the functional map $\Gamma_{v,\infty}\left[g\right]$ defined in (20) is a contraction with respect to the metric induced by the norm $||g||_{L^{1}}:=E[|g(W_{vh},L_{vh},\boldsymbol{u}_{v}(L_{vh}))|]$ ( $g$ is a $\left[0,1\right]$ -valued function on the support of $(W_{vh},L_{vh},\boldsymbol{u}_{v}(L_{vh}))$ ),777Note that $E[|g(W_{vh},L_{vh},\boldsymbol{u}_{v}(L_{vh}))|]$ is independent of $h$ , given C2 and C3-SD; and it can be used as a norm. i.e.,

[TABLE]

Let $\bar{\pi}_{v}\in\left[0,1\right]$ be a solution to the functional equation $g=\Gamma_{v,\infty}[g]$ (which is unique under the contraction property). Then, for each $v$ , it holds that for any solution $\bar{\psi}_{v}$ to $g=\Gamma_{v,N_{v}}\left[g\right]$ , which may not be unique,

[TABLE]

Note that the limit of $\bar{\psi}_{v}$ , a fixed point of $\Gamma_{v,\infty}$ , corresponds to the equilibrium (constant and symmetric) beliefs for the C3-IID case (a fixed point of $m^{v}\left(\cdot\right)$ in (8); recall that $\Pi_{vh}=\bar{\pi}_{v}$ by Propositions 1 - 2).

This theorem is restated as Theorem 5 in Appendix A.2, where its proof is also provided. Theorem 5 derives the convergence of the equilibrium beliefs (without the symmetry assumption $\psi_{vh}\left(\cdot\right)=\bar{\psi}_{v}\left(\cdot\right)$ ), viz. that the limit of the solution to (13) is given precisely by the solution of (7). The theorem also derives the rate of the convergence in 21: The rate is faster if (1) the area of each village expands quicker as $N_{v}\rightarrow\infty$ under the increasing-domain assumption; and if (2) the degree of spatial dependence of $\left\{\boldsymbol{u}_{vh}\right\}$ is weaker. Note that the contraction condition of the limit (unconditional) operator implies existence and uniqueness of the solution, but we do not need to impose it on the operator defined via the conditional operator; multiplicity of solutions ( $\bar{\psi}_{v}=\Gamma_{v,N_{v}}\left[\bar{\psi}_{v}\right]$ ) is allowed for, and any of the solutions would then converge to $\bar{\pi}_{v}$ , where the existence of a solution can be relatively easily checked using other, less restrictive fixed point theorem.

In sum, this convergence result justifies the use of Brock and Durlauf (2001a) type specification of constant and symmetric beliefs, even when unobserved heterogeneity exhibits spatial dependence. This enables us to overcome complications in identification and inference posed by the dependence of beliefs on unobservables. In the next section, we present two estimators – one based on the Brock and Durlauf type specification and another that takes into account the conditional expectation feature of the beliefs as in (10). Then, we (a) show that the difference between the two estimators is asymptotically negligible, and (b) justify using observable group average outcome as a regressor in an econometric specification of individual level binary choice as in Brock and Durlauf’s estimation procedure.

**Further Discussions and Comparison with Menzel (2016): In our discussion of the spatial case, the sequence $\left\{\boldsymbol{u}_{vh}\right\}=\{\boldsymbol{u}_{v}(L_{vh})\}$ , defined through two independent components, is called subordinated to the stochastic process $\{\boldsymbol{u}_{v}(l)\}$ via the index variables $\left\{L_{vh}\right\}$ . Subordination has been used previously in econometrics and statistics for modelling spatially dependent processes, c.f. Andrews (2005, Section 7) and Lahiri and Zhu (2006). One implication of subordination is the so-called exchangeability property (see, e.g., Andrews, 2005), and if a sequence of random variables is exchangeable, it can be I.I.D. conditionally on some sigma algebra (often denoted by $\mathfrak{F}_{\infty}$ , the tail sigma algebra), which is known as de Finetti’s theorem (see, e.g., Ch. 7 of Hall and Heyde, 1980). In our setting, this corresponds to the conditional I.I.D.-ness of $\{(W_{vh},L_{vh},\boldsymbol{u}_{v}(L_{vh}))\}$ , given a realization of the stochastic process $\boldsymbol{u}_{v}\left(\cdot\right)$ (as well as that of $\boldsymbol{\xi}_{v}$ ), where $\mathfrak{F}_{\infty}$ is set as the sigma algebra generated by the random function $\boldsymbol{u}_{v}\left(\cdot\right)$ .

Menzel (2016) has proposed a conditional inference method for games with many players under the exchangeability assumption. Indeed, Menzel (2016) and the present paper are similar in that both consider estimation of a game with the I.I.D. condition relaxed and under many-player asymptotics. However, there are some substantive differences between Menzel’s (2016) framework and ours. Firstly, in his conditional inference scheme, the probability law recognized by players in a game is different from that used by researchers for inference purposes (i.e., the former is the unconditional law and the latter is the conditional law given $\mathfrak{F}_{\infty}$ ), but they are identical in our setting. This feature of non-identical laws causes difficulty in constructing a valid, interpretable moment restriction that guarantees consistent estimation. In the context of estimating structural economic models (including game theoretic models), such a restriction is usually presented as some exogeneity or exclusion condition that is derived by taking into account players’ optimization behavior, i.e., the restriction is constructed based on the players’ perspective. This sort of construction may not give a valid moment restriction under the conditional inference scheme where validity has to be judged from the researcher’s perspective with the conditional law. To see this point, consider a simple binary choice example: $Y_{i}=1\left\{X_{i}^{\prime}\beta+\varepsilon_{i}\geq 0\right\}$ , where $\varepsilon_{i}|X_{i}\sim N\left(0,1\right)$ and $X_{i}$ is a covariate. In the standard case, the parameter $\beta$ can be estimated through $E\left[w\left(X_{i}\right)\{Y_{i}-\Phi\left(X_{i}^{\prime}\beta\right)\}\right]=0$ , where $w\left(\cdot\right)$ is a weighting function, and $\Phi$ is the distribution function of $N\left(0,1\right)$ . In contrast, under an inference scheme that exploits exchangeability or conditional I.I.D.-ness of $\left\{(Y_{i},X_{i})\right\}_{i=1}^{\infty}$ , consistent estimation would require $E\left[w\left(X_{i}\right)\{Y_{i}-\Phi\left(X_{i}^{\prime}\beta\right)\}|\mathfrak{F}_{\infty}\right]=0$ , where $\mathfrak{F}_{\infty}$ is the tail sigma algebra of $\left\{(Y_{i},X_{i})\right\}_{i=1}^{\infty}$ . The $\mathfrak{F}_{\infty}$ -conditional moment is in general hard to interpret, is not implied by the unconditional one, and it is not always be obvious whether it holds. Indeed, Andrews (2005) discuses failure of consistency in a simple least square regression case when the conditional law is used.

Another feature of Menzel (2016) that is distinct from ours is his focus on aggregate* games*. In his setting, players’ utilities depend on the ‘aggregate state’, that is computed through the conditional expectation of others’ actions ( $G_{mn}(s;\boldsymbol{\sigma}_{m})$ defined in Eq. (2.1) on p. 311, Menzel, 2016). This object is the counterpart of $\Pi_{vh}$ in our setting in that players’ interactions take place only through the aggregate state $\boldsymbol{\sigma}_{m}$ ( $\Pi_{vh}$ in our notation). Our $\Pi_{vh}$ for the spatially dependent case is defined in (10) and (13) through conditional expectations ( $E[A_{vk}|\mathcal{I}_{vh}]$ ) given all information $\mathcal{I}_{vh}$ available to player $(v,h)$ , i.e., both the individual variables $(W_{vh},L_{vh},\boldsymbol{u}_{v}(L_{vh}))$ and common variable $\boldsymbol{\xi}_{v}$ . On the other hand, a counterpart of Menzel’s aggregate state in our context is

[TABLE]

where the conditional expectation is computed given only the common $\boldsymbol{\xi}_{v}$ (called a public signal on p. 310 in Menzel, 2016, denoted by $w_{m}$ ). The formulation (22) means that each player does not utilize all the available information for predicting others’ behavior even when $\boldsymbol{u}_{vh}$ is useful for $(v,h)$ to predict $\boldsymbol{u}_{vk}$ (and thus $A_{vk}$ ) due to correlation between $\boldsymbol{u}_{vk}$ and $\boldsymbol{u}_{vh}$ . This contradicts the intuitively natural structure of belief formation in Bayesian games via rational expectations in our setting. Note, however, that Menzel (2016, Section 3) also discusses convergence of finite-players games and the associated equilibria. His convergence result is based on the assumption that players’ predictions about other players is based on $E[\cdot|\boldsymbol{\xi}_{v}]$ both in finite games and its limit, while our result establishes convergence of the belief process, where $E[\cdot|\mathcal{I}_{vh}]$ is used in a finite-player game but reduces to $E[\cdot|\boldsymbol{\xi}_{v}]$ in the limit. In this sense, our belief convergence result may be interpreted as providing an asymptotic justification of Menzel’s (2016) ‘aggregate game’ framework.

3 Econometric Specification and Estimators

In this section, we lay out the econometric specification of our model, and describe estimation of preference parameters (denoted by $\theta_{1}^{\ast}$ ), assuming that the observed sample is generated via the game introduced in the previous section and satisfying assumptions C1, C2, and C3-SD (the** C3-IID case is simpler, and is nested within the C3-SD case; see more on this below). **In particular, we define the true parameter via a conditional moment restriction that is derived from specification of utility functions and the structure of the game in each of $\bar{v}$ villages. As discussed above, the beliefs in the finite-player game possess a conditional expectation feature, so the conditional expectation used to define $\theta_{1}^{\ast}$ has a complicated form, and consequently the estimator based on it, denoted by $\hat{\theta}_{1}^{\mathrm{SD}}$ below, is difficult to implement.

Therefore, we construct another, computationally simpler estimator $\hat{\theta}_{1}$ based on a conditional expectation restriction derived from the limit model with the limit belief $\bar{\pi}_{v}$ (derived in Theorem 1), and use it in our empirical application. We call $\hat{\theta}_{1}$ Brock-Durlauf type as it resembles the estimator used in Brock and Durlauf (2001a, 2007). Since the limit model is not the actual data generating process (DGP), our preferred estimator $\hat{\theta}_{1}$ is based on a mis-specified conditional moment restriction. However, we show that the estimator for the finite-player game with spatial dependence, $\hat{\theta}_{1}^{\mathrm{SD}}$ , which takes into account the conditional-expectation feature of the beliefs (as in (10)) shares the same limit as $\hat{\theta}_{1}$ that is based on the limit model, as $N\rightarrow\infty$ , under the asymptotic scheme for spatial data as introduced in the previous section and in Appendix A.2.1. In this sense, the two estimators, $\hat{\theta}_{1}^{\mathrm{SD}}$ and $\hat{\theta}_{1}$ , are asymptotically equivalent, and this result justifies the use of the simpler, Brock-Durlauf type estimation procedure. This result is formally proved in Theorem 2 below. The key challenge in this proof is showing uniform convergence of the fixed point solutions (beliefs) over the parameter space.

Forms of Beliefs under Spatial Dependence: To develop our estimators, we assume that the players’ beliefs in (10) are symmetric: $\Pi_{vh}=\bar{\psi}_{v}(W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ , i.e., the functional form of $\bar{\psi}_{v}\left(\cdot\right)$ is common for all the players in the same village $v$ .888This can be justified under C1, C2, and C3-SD when the mapping from a $\left[0,1\right]$ -valued function $g\left(\cdot\right)$ to another $\left[0,1\right]$ -valued function:

$E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right]$

(23)

is a contraction, where $\mathcal{I}_{vh}=(W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ . This contraction condition for the functional mapping is analogous to that for the function $m^{v}\left(r\right)$ (defined in (8)) in Proposition 2. The proof of symmetric equilibrium beliefs $\bar{\psi}_{v}\left(\cdot\right)$ is similarly analogous to the proof of Proposition 2, and is omitted for brevity. We provide and discuss a sufficient condition for (23) to be a contraction in Appendix A.3. We note that given the (conditional) independence assumptions in C2 and C3-SD, the forms of the beliefs can be slightly simplified. That is, the beliefs are a fixed point of the conditional expectation operator (17) with $(W_{vh},L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ being conditioning variables; however, we can show that $(v,h)$ ’s variable $W_{vh}$ is irrelevant in predicting other $\left(v,k\right)$ ’s variables in that

[TABLE]

and accordingly, the fixed point solution is a function of $(L_{vh},\boldsymbol{u}_{vh},\boldsymbol{\xi}_{v})$ without $W_{vh}$ .999We can prove (24) as follows: The sequence $\left\{\left(W_{vh},L_{vh}\right)\right\}_{h=1}^{N_{v}}$ is conditionally I.I.D. given $\boldsymbol{\xi}_{v}$ (by C2) and thus it is also conditionally independent of the stochastic process $\left\{\boldsymbol{u}_{v}\left(l\right)\right\}$ given $\boldsymbol{\xi}_{v}$ (by C3-SD (ii)). Therefore, $\left\{\left(W_{vh},L_{vh}\right)\right\}_{h=1}^{N_{v}}$ is conditionally i.i.d. given $(\left\{\boldsymbol{u}_{v}\left(l\right)\right\},\boldsymbol{\xi}_{v})$ , implying that

$\left(W_{vh},L_{vh}\right)\perp\left(W_{vk},L_{vk}\right)|(\left\{\boldsymbol{u}_{v}\left(l\right)\right\},\boldsymbol{\xi}_{v})\text{.}$

Since it also holds that $\left(W_{vh},L_{vh}\right)\perp\left\{\boldsymbol{u}_{v}\left(l\right)\right\}|\boldsymbol{\xi}_{v}$ , we apply the conditional independence relation (75) with $Q=\left(W_{vh},L_{vh}\right)$ , $R=\left(W_{vk},L_{vk}\right)$ , and $S=\left\{\boldsymbol{u}_{v}\left(l\right)\right\}$ , to obtain

$\left(W_{vk},L_{vk},\left\{\boldsymbol{u}_{v}\left(l\right)\right\}\right)\perp\left(W_{vh},L_{vh}\right)|\boldsymbol{\xi}_{v}$

$\Rightarrow$ $\left(W_{vk},L_{vk},\left\{\boldsymbol{u}_{v}\left(l\right)\right\}\right)\perp W_{vh}|\left(L_{vh},\boldsymbol{\xi}_{v}\right)$

$\Rightarrow$ $\left(W_{vk},L_{vk},\boldsymbol{u}_{v}\left(L_{vk}\right),\boldsymbol{u}_{v}\left(L_{vh}\right)\right)\perp W_{vh}|\left(L_{vh},\boldsymbol{\xi}_{v}\right)$

$\Rightarrow$ $\left(W_{vk},L_{vk},\boldsymbol{u}_{v}\left(L_{vk}\right)\right)\perp W_{vh}|\left(L_{vh},\boldsymbol{u}_{v}\left(L_{vh}\right),\boldsymbol{\xi}_{v}\right)$

* *

where the derivations of the second and fourth lines have used the following conditional independence relation: for random objects $T$ , $U$ , $V$ , and $C$ , if $T\perp\left(U,V\right)|C$ , then $T\perp U|\left(V,C\right)$ ; for the second line, we set $T=\left(W_{vk},L_{vk},\left\{\boldsymbol{u}_{v}\left(l\right)\right\}\right)$ , $U=W_{vh}$ , and $V=L_{vh}$ , with $C=\boldsymbol{\xi}_{v}$ ; and for the fourth line, $T=W_{vh}$ , $U=(W_{vk},L_{vk},\boldsymbol{u}_{v}\left(L_{vk}\right))$ , and $V=\boldsymbol{u}_{v}\left(L_{vh}\right)$ with $C=\left(L_{vh},\boldsymbol{\xi}_{v}\right)$ . Thus, with slight abuse of notation, we write

[TABLE]

Linear Index Structure: We now specify the forms of the utility functions. With few large peer-groups (e.g. there are eleven large villages in our application dataset), one cannot consistently estimate the impact of the belief $\Pi_{vh}$ on the choice probability function nonparametrically holding other regressors constant.101010This is because $\Pi_{vh}$ is constant within a village in the (conditionally) I.I.D. case, and this constancy also holds for the limit model in the spatial case. In particular, the fixed point constraint does not help because of dimensionality problems. Indeed, the fixed point condition: $\pi=\int q_{1}\left(p,y,\pi\right)dF_{P,Y}\left(p,y\right)$ , where $F_{P,Y}\left(p,y\right)$ , the joint CDF of $\left(P,Y\right)$ is identified, the unknown function $q_{1}\left(p,y,\pi\right)$ has higher dimension than the observable $F_{P,Y}\left(p,y\right)$ . Accordingly, following Manski (1993), and Brock and Durlauf (2001a, 2007), we assume a linear index structure with $\boldsymbol{\eta}=(\eta^{0},\eta^{1})^{\prime}$ viz. that utilities are given by

[TABLE]

where corresponding to Assumptions 1 - 2, we assume that $\beta_{0}>0$ , $\beta_{1}>0$ , i.e., non-satiation in numeraire, $\beta_{1}$ need not equal $\beta_{0}$ , i.e. income effects can be present, and that $\alpha_{1}\geq 0\geq\alpha_{0}$ , i.e., compliance yields higher utility. These utilities can be viewed as expected utilities corresponding to Bayes-Nash equilibrium play in a game of incomplete information with many players, as outlined in Section 2 above. Below in Section 4, we will provide more details on interpretation of the individual coefficients in (26) when discussing welfare calculations. These details do not play any role in the rest of this section.

Using (26) and the structure of $\boldsymbol{\eta}_{vh}=\boldsymbol{\xi}_{v}+\boldsymbol{u}_{vh}$ (see (3)) with $\boldsymbol{\xi}_{v}:=(\xi_{v}^{0},\xi_{v}^{1})^{\prime}$ and $\boldsymbol{u}_{vh}:=(u_{vh}^{0},u_{vh}^{1})^{\prime}$ , it follows that

[TABLE]

where we have defined $\bar{\xi}_{v}:=c_{0}+\left(\xi_{v}^{1}-\xi_{v}^{0}\right)$ .

Recall that the probabilistic conditions in C2 and C3-SD are stated conditional on the (realized values of) village-fixed unobserved heterogeneity $\bar{\xi}_{v}$ , as in the econometric literature on fixed-effects panel data models. In this sense, we can treat $\left\{\bar{\xi}_{v}\right\}$ as non-stochastic. Indeed, given many observations per villages, the (realized) values of $\left\{\bar{\xi}_{v}\right\}$ can be estimated and are included in a set of parameters to be estimated. We discuss this point further in Section 4.4 below.

Econometric Specifications: We now present the alternative estimators. To do this, we need some more notation. Let $\theta_{1}=(\boldsymbol{c}^{\prime},\alpha)^{\prime}$ denotes a (preference) parameter vector, where $\boldsymbol{c}=(c_{1},c_{2})^{\prime}$ is the coefficient vector corresponding to $W_{vh}=(P_{vh},Y_{vh})^{\prime}$ .** In the rest of this Section 3, we assume that the village-fixed parameters $\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}}$ are known, which is for notational simplicity; this assumption does not change any substantive arguments on the convergence of the estimators. We discuss identification/estimation schemes of these parameters below and provide a complete proof for the case when $\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}}$ are estimated using one of the identification schemes (e.g. the homogeneity assumption) in Appendix A.4. **Given (25) and (27), we can write

[TABLE]

In order to incorporate the fixed-point feature of $\bar{\psi}_{v}$ in estimation, where we write $\bar{\psi}_{v}(L_{vh},\boldsymbol{u}_{vh})=\bar{\psi}_{v}(L_{vh},\boldsymbol{u}_{vh},\xi_{v})$ for notational simplicity**,** we can assume a parametric model of spatial dependence for the stochastic process $\{\varepsilon_{vh}\}$ , which is required to compute the functional equations defining $\bar{\psi}_{v}$ . Corresponding to the definition of $\boldsymbol{u}_{vh}=\boldsymbol{u}_{v}\left(L_{vh}\right)$ with $\boldsymbol{u}_{v}\left(l\right)=(u_{v}^{0}(\tilde{l}),u_{v}^{1}(\tilde{l}))$ , we let $\varepsilon_{vh}=\varepsilon_{v}\left(L_{vh}\right)$ , where $\left\{\varepsilon_{v}\left(l\right)\right\}$ is a stochastic process defined as $\varepsilon_{v}(l)=u_{v}^{1}(l)-u_{v}^{0}(l)$ . We let $\boldsymbol{H}(\tilde{e}|$ $e,||\tilde{l}-l||;\theta_{2}^{\ast})$ be the conditional distribution of $\varepsilon_{v}(\tilde{l})=u_{v}^{1}(\tilde{l})-u_{v}^{0}(\tilde{l})$ given $\varepsilon_{v}(l)=e$ , parametrized by a finite dimensional parameter $\theta_{2}\in\Theta_{2}$ , and the (pseudo) true value is denoted by $\theta_{2}^{\ast}$ . We also write the marginal CDF of $\varepsilon_{v}(\tilde{l})$ by $\boldsymbol{H}\left(e\right)$ and its probability density $\boldsymbol{h}\left(e\right)$ . In the sequel, we also write the marginal CDF of $-\varepsilon_{v}(\tilde{l})$ as $F_{\varepsilon}\left(e\right)$ , and thus $\boldsymbol{H}\left(e\right)=1-F_{\varepsilon}\left(-e\right)$ . The joint distribution function of $(\varepsilon_{v}(\tilde{l}),\varepsilon_{v}(l))$ is $\int_{s\leq e}\boldsymbol{H}(\tilde{e}|$ $s,|\tilde{l}-l|_{1};\theta_{2}^{\ast})\boldsymbol{h}\left(s\right)ds$ , given the location indices $\tilde{l}$ and $l$ .111111This specification implies pairwise stationarity of $\{\varepsilon_{v}\left(l\right)\}$ , i.e. the joint distribution of $\varepsilon_{v}(\tilde{l})$ and $\varepsilon_{v}(l)$ depends only on the distance $|\tilde{l}-l|_{1}$ . Stationarity is not strictly necessary for our purpose but is maintained for simplicity. We could also specify the full joint distribution of the whole $\varepsilon_{v}\left(l\right)$ (for any $l\in\mathcal{L}_{v}$ , or for any $l_{1},l_{2},\dots,l_{q}\in\mathcal{L}_{v}$ with $q$ being any finite integer; say, a Gaussian process), which would not affect our estimation method.

To develop estimators that incorporate the fixed point restriction, define the following functional operator based on $\boldsymbol{H}$ :

[TABLE]

for $v=1,\dots,\bar{v}$ , where $\mathcal{F}_{v,N_{v}}^{\star}$ is a functional operator from a $\left[0,1\right]$ -valued function $g=g\left(l,e;\theta_{1},\theta_{2}\right)$ to another function $\mathcal{F}_{v,N_{v}}^{\star}\left[g\right]$ , and $F_{WL}^{v}(w,l)$ is the joint CDF of $(W_{vh},L_{vh})$ . We provide sufficient conditions for this $\mathcal{F}_{v,N_{v}}^{\star}$ to be a contraction in Appendix A.3.

Given the above set-up, define the model to be estimated as:

[TABLE]

where $\theta_{1}^{\ast}(=(\boldsymbol{c}^{\ast\prime},\alpha^{\ast})^{\prime})$ and $\theta_{2}^{\ast}$ denote the true parameters and $\psi_{v}^{\star}(L_{vh},\varepsilon_{vh};\theta_{1},\theta_{2})$ is a solution to the functional equation defined through the operator (29) (for each $\left(\theta_{1},\theta_{2}\right)$ given):

[TABLE]

and C1, C2, **C3-SD, **and some regularity conditions (provided below) are satisfied. Henceforth, the model (30) will be assumed to be the DGP of observable variables $\left\{(A_{vh},W_{vh},L_{vh})\right\}_{h=1}^{N_{v}}$ ( $v=1,\dots,\bar{v}$ ).

3.1 Econometric Estimators

Definition of the Estimand: Suppose for now that the true parameter $\theta_{2}^{\ast}$ for the spatial dependence is given. Then, based on (28), we define the true preference parameter $\theta_{1}^{\ast}$ (i.e., our estimand) as the solution to the conditional moment restriction:

[TABLE]

where $C_{v}$ is the conditional choice probability function121212Note that all the (conditional) expectations, $E\left[\cdot\right]$ and $E\left[\cdot|\cdot\right]$ in this Section 3 are taken with respect to the law of $A_{vh}$ , $W_{vh}$ , $L_{vh}$ , and $\varepsilon_{vh}(=\varepsilon_{v}(L_{vh})$ , or $\boldsymbol{u}_{vh}=\boldsymbol{u}_{v}\left(L_{vh}\right)$ conditional on the unobserved heterogeneities $\bar{\xi}_{v}$ (or $\boldsymbol{\xi}_{v}$ ).:

[TABLE]

Practical Estimator Based on the Limit Model: Given our parametric set-up, we can in principle compute an empirical analogue of (33) by solving an empirical version of the fixed point equation (31). This estimator, denoted below by $\hat{\theta}_{1}^{\mathrm{SD}}$ , is difficult to compute in practice. Therefore, we consider an alternative estimator based on the simpler conditional moment condition:

[TABLE]

This is derived from the *limit model *with the limit beliefs $\bar{\pi}_{v}$ , which do not depend on the unobserved heterogeneity and other $(v,h)$ specific variables. Indeed, the limit model is not the true DGP, and thus this (34) is mis-specified under **C3-SD **(it is correctly specified under C3-IID). Nonetheless, we show that the estimator based on (34), which we eventually use in our empirical application, can be justified in an asymptotic sense. This simpler estimator is given by:

[TABLE]

where

[TABLE]

where $\theta_{1}=(\boldsymbol{c}^{\prime},\alpha)^{\prime}$ , $\Theta_{1}$ is the parameter space that is compact in $\mathbb{R}^{d_{1}}$ with $d_{1}-1$ being the dimension of $W_{vh}$ , $N=\sum_{v=1}^{\bar{v}}N_{v}$ , and the constant beliefs, $\bar{\pi}_{v}$ , (that appear in the limit model) are estimated by $\hat{\pi}_{v}=\frac{1}{N_{v}}\sum_{h=1}^{N_{v}}A_{vh}$ . We use the label ‘BR’ for this estimator, as it is based on the Brock and Durlauf (2001a) type formulation. This estimator $\hat{\theta}_{1}$ is easy to compute as its objective function $\hat{L}^{\mathrm{BR}}\left(\cdot\right)$ requires neither solving fixed point problems nor any numerical integration, in which the belief formulation is based on the limit model with constant beliefs $\bar{\pi}_{v}$ . Below, we show that the complicated estimator $\hat{\theta}_{1}^{\mathrm{SD}}$ (based on (30)) and the simpler one $\hat{\theta}_{1}$ have the same limit.

**Potential Estimator for the Finite-Player Game: **We now formally introduce the computationally difficult potential estimator $\hat{\theta}_{1}^{\mathrm{SD}}$ based on (32). It is defined through the following objective function:

[TABLE]

where $\hat{C}$ is an estimate of the conditional choice probability that explicitly incorporate conditional-belief and fixed-point features:

[TABLE]

and $\hat{\psi}_{v}^{\star}\left(L_{vh},e;\theta_{1},\theta_{2}\right)$ is an estimator of the belief and is defined as a solution to the following functional equation for each $(\theta_{1},\theta_{2})$ :

[TABLE]

$\mathcal{\hat{F}}_{v,N_{v}}^{\star}$ is an empirical version of $\mathcal{F}_{v,N_{v}}^{\star}$ (defined in (29)) in which the true $F_{W,L}^{v}$ is replaced by $\hat{F}_{W,L}^{v}$ :

[TABLE]

This $\hat{\psi}_{v}^{\star}$ is an empirical version of a solution to (29). A notable feature of this is that it is a function of the unobserved heterogeneity (represented by the variable $e$ ). Due to this dependence on $e$ , computation of $\hat{C}$ in (36) and $\mathcal{\hat{F}}_{v,N_{v}}^{\star}$ in (38) is difficult, and requires numerical integration of the indicator functions; furthermore, finding the fixed point $\hat{\psi}_{v}^{\star}$ in the functional equation (37) will also require some numerical procedure.

Here, we do not pursue how to identify and estimate the parameter for the spatial dependence $\theta_{2}^{\ast}$ (since our empirical application is not anyway based on $\hat{L}^{\mathrm{SD}}(\theta_{1},\theta_{2})$ ), but suppose the availability of some reasonable preliminary estimator $\hat{\theta}_{2}$ with $\hat{\theta}_{2}\overset{p}{\rightarrow}\theta_{2}^{\ast}$ , and define our estimator as

[TABLE]

Note that given this form of $\hat{\theta}_{1}^{\mathrm{SD}}$ , we can again interpret this estimator as a moment estimator that solves

[TABLE]

with some appropriate choice of the weight $\boldsymbol{\omega}\left(W_{vh},\theta_{1},\hat{\theta}_{2}\right)$ . This may be viewed as a sample moment condition based on the population one in (32). The corresponding estimation procedure would be similar to the nested fixed-point algorithm, as in Rust (1987).

3.2 Convergence of the Estimators

We now show that $||\hat{\theta}_{1}^{\mathrm{SD}}-\hat{\theta}_{1}||\overset{p}{\rightarrow}0$ , i.e., $\hat{\theta}_{1}^{\mathrm{SD}}$ based on the correct condition moment restriction (32) and $\hat{\theta}_{1}$ based on the mis-specified one (34) are asymptotically equivalent. That is, if $\hat{\theta}_{1}$ is consistent, so is $\hat{\theta}_{1}^{\mathrm{SD}}$ and vice versa; in the proof, we show that both the estimators are consistent for $\theta_{1}^{\ast}$ that satisfies (111). This is formally stated in the following theorem:

Theorem 2

Suppose that C1, C2, C3-SD, Assumptions 4, 5, 6, 7, and 8 hold. Then

[TABLE]

The formal proof is provided in Appendix A.4; the outline is as follows. We start by introducing another, intermediate estimator that is based on constant beliefs but solves the Fixed Point problem of the Limit* model*, $\hat{\theta}_{1}^{\mathrm{FPL}}=\operatorname*{argmax}\limits_{\theta_{1}\in\Theta_{1}}\hat{L}^{\mathrm{FPL}}(\theta_{1})$ , where

[TABLE]

where $\pi=\hat{\pi}_{v}^{\star}(\theta_{1})\in\left[0,1\right]$ is a solution to the fixed point equation for each $\theta_{1}$ (fixed):

[TABLE]

Note that $\hat{\pi}_{v}^{\star}(\theta_{1})\in\left[0,1\right]$ is a sample version of $\pi_{v}^{\star}\left(\theta_{1}\right)$ that solves

[TABLE]

which is the population version of (39) with $\hat{F}_{W}^{v}$ replaced by the true CDF $F_{W}^{v}$ of $W_{vh}$ . This $\hat{\theta}_{1}^{\mathrm{FPL}}$ is constructed based on the limit model (with constant beliefs), but it explicitly solves the fixed point restriction (39) (unlike $\hat{\theta}_{1}$ derived from the Brock-Durlauf type moment restriction (34)). $\hat{\theta}_{1}^{\mathrm{FPL}}$ may be interpreted as a moment estimator that is derived from the conditional moment restriction131313Note that $\hat{\theta}_{1}^{\mathrm{FPL}}$ can also be defined as solving $\hat{M}^{\mathrm{FPL}}\left(\theta_{1}\right)=0$ , where, given an appropriate choice of the weight $\boldsymbol{\omega}\left(W_{vh},\theta_{1}\right)$ ,

$\hat{M}^{\mathrm{FPL}}\left(\theta_{1}\right):=\frac{1}{N}\sum_{v=1}^{\bar{v}}\sum_{h=1}^{N_{v}}\boldsymbol{\omega}\left(W_{vh},\theta_{1}\right)\left\{A_{vh}-F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\hat{\pi}_{v}^{\star}(\theta_{1})\right)\right\}.$

:

[TABLE]

Note that this restriction is also a mis-specified one.

We show the convergence of $||\hat{\theta}_{1}^{\mathrm{SD}}-\hat{\theta}_{1}||$ in two steps. In the first step, we show that $\hat{\theta}_{1}^{\mathrm{FPL}}$ and $\hat{\theta}_{1}$ have the same limit, which is the solution to a different conditional moment restriction (See (111) in Appendix A.4). In the second step, we show that $\hat{L}^{\mathrm{SD}}(\theta_{1},\hat{\theta}_{2})$ is asymptotically well approximated by $\hat{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ uniformly over $\theta_{1}\in\Theta_{1}$ for any sequence of $\hat{\theta}_{2}$ (as $N\rightarrow\infty$ ).

4 Welfare Analysis

We now move on to the second part of the paper, which concerns welfare analysis of policy interventions under spillovers. Since we assume spillovers are restricted to the village where households reside, any welfare effect of a policy intervention can be analyzed village by village. So for economy of notation, we drop the $\left(v,h\right)$ subscripts except when we account explicitly for village-fixed effects during estimation. Also, we use the same notation $\pi$ to denote both individual beliefs entering individual utilities, and the unique, equilibrium belief about village take-up rate entering the average demand function. The assumption of a constant (within village) $\pi$ is justified via the results Proposition 1, Proposition 2 and Theorem 1.

In the welfare results derived below, all probabilities and expectations – e.g. mean welfare loss – in Sections 4.1-4.3 are calculated with respect to the marginal distribution of aggregate unobservables, denoted by $\boldsymbol{\eta}=\boldsymbol{\eta}_{vh}$ above and below. In this sense, they are analogous to ‘average structural functions’ (ASF), introduced by Blundell and Powell (2004). Later, when discussing estimation of the ASF, together with the implied pre- and post-intervention aggregate choice probabilities and average welfare in Section 4.4, we will allude to village-fixed effects explicitly, and show how they are estimated and incorporated in demand and welfare predictions**.**

In order to conduct welfare analysis, we impose two restrictions on the utilities.

Assumption 1

$U_{1}\left(\cdot,\pi,\boldsymbol{\eta}\right)$ * and $U_{0}\left(\cdot,\pi,\boldsymbol{\eta}\right)$ (introduced in (1) in Section 2) are continuous and strictly increasing for each fixed value of $\pi$ and $\boldsymbol{\eta}$ , i.e., all else equal, utilities are non-satiated in the numeraire.*

Assumption 2

For each $y$ and $\boldsymbol{\eta}$ , $U_{1}\left(y,\cdot,\boldsymbol{\eta}\right)$ is continuous and strictly increasing, and $U_{0}\left(y,\cdot,\boldsymbol{\eta}\right)$ is continuous and weakly decreasing, i.e. conforming yields higher utility than not conforming for each individual.

Define $q_{1}\left(p,y,\pi\right)$ to be the structural probability (i.e. Average Structural Function or ASF) of a household choosing $1$ when it faces a price of $p$ , and has income $y$ and belief $\pi$ :

[TABLE]

and let $q_{0}\left(p,y,\pi\right)=1-q_{1}\left(p,y,\pi\right)$ , where $F_{\boldsymbol{\eta}}$ is the CDF of $\boldsymbol{\eta}_{vh}$

Policy Intervention: Start with a situation where the price of alternative $1$ is $p_{0}$ and the value of $\pi$ is $\pi_{0}$ . Then suppose a price subsidy is introduced such that that individuals with income less than an income threshold $\tau$ become eligible to buy the product at price $p_{1}<p_{0}$ . This policy will alter the equilibrium adoption rate; suppose the new equilibrium adoption rate changes to $\pi_{1}$ . How the counterfactual $\pi_{1}$ and $\pi_{0}$ are calculated will be described below. For given values of $\pi_{0}$ and $\pi_{1}$ , we now derive expressions for welfare resulting from the intervention. By “welfare” we mean the compensating variation (CV), viz. what hypothetical income compensation would restore the post-change indirect utility for an individual to its pre-change level. For a subsidy-eligible individual, for any potential value of $\pi_{1}$ corresponding to the new equilibrium, the individual compensating variation is the solution $S$ to the equation

[TABLE]

whereas for a subsidy-ineligible individual, it is the solution $S$ to

[TABLE]

Note that we do not take into account peer-effects again in defining the CV because the income compensation underlying the definition of CV is hypothetical. So the impact of actual income compensation on neighboring households is irrelevant. Since the CV depends on the unobservable $\boldsymbol{\eta}$ , the same price change will produce a distribution of welfare effects across individuals; we are interested in calculating that distribution and its functionals such as mean welfare.

Existence of $S$ : Under the following condition, there exists an $S$ that solves (42) and (43):

Condition

For any fixed $\boldsymbol{\eta}$ and $(p_{0},p_{1},y)$ , it holds that (i) $\lim_{S\searrow-\infty}U_{1}\left(y+S-p_{1},1,\boldsymbol{\eta}\right)<U_{1}\left(y-p_{0},0,\boldsymbol{\eta}\right)$ , and (ii) $\lim_{S\nearrow\infty}U_{0}\left(y+S,1,\boldsymbol{\eta}\right)>U_{0}\left(y,0,\boldsymbol{\eta}\right)$ .

Intuitively, this condition strengthens Assumption 1 by requiring that utilities can be increased and decreased sufficiently by varying the quantity of numeraire. Existence follows via the intermediate value theorem. Under an index structure, existence is explicitly shown below. Finally, uniqueness of the solution to (42) and (43) follows by strict monotonicity in numeraire. Since the maximum of two strictly increasing functions is strictly increasing, the LHS of (42) and (43) are strictly increasing in $S$ , implying a unique solution.

Welfare with Index Structure: In accordance with the literature on social interactions (see Section 3 above), from now on we maintain the single-index structure introduced in (26):

[TABLE]

with $\beta_{0}>0$ , $\beta_{1}>0$ , and $\alpha_{1}\geq 0\geq\alpha_{0}$ .141414We can also allow for concave income effects by specifying, say,

$\displaystyle U_{0}\left(y,\pi,\eta\right)$ $\displaystyle=\delta_{0}+\beta_{0}\ln y+\alpha_{0}\pi+\eta^{0}\text{,}$

$\displaystyle U_{1}\left(y-p,\pi,\eta\right)$ $\displaystyle=\delta_{1}+\beta_{1}\ln\left(y-p\right)+\alpha_{1}\pi+\eta^{1}\text{,}$

but we wish to keep the utility formulation as simple as possible to highlight the complications in welfare calculations even in the simplest linear utility specification. In our empirical setting of anti-malarial bednet adoption, there are multiple potential sources of interactions (i.e. $\alpha_{1},\alpha_{0}\neq 0$ ). The first is a pure preference for conforming; the second is increased awareness of the benefits of a bednet when more villagers use it; the third is a perceived negative health externality. The medical literature suggests that the technological health externality is positive, i.e. as more people are protected, the lower is the malaria burden, but the perceived health externality is likely to be negative if households correctly believe that other households’ bednet use deflects mosquitoes to unprotected households, but ignore the fact that those deflected mosquitoes are less likely to carry the parasite. Indeed, the implications for adoption are different: under the positive health externality, one would expect free-riding, hence a negative effect of others’ adoption on own adoption; under the negative health externality, the correlation would be positive.

In particular, let $\gamma_{p}>0$ denote the conforming plus learning effect, and $\gamma_{H}$ denote the health externality. Then it is reasonable to assume that $\alpha_{1}\equiv\gamma_{p}\geq 0$ and $\alpha_{0}=\gamma_{H}-\gamma_{p}\leq 0$ . In other words, the compliance motive and learning effect together are equal in magnitude but opposite in sign between buying and not buying. Further, if a household uses an ITN, then there is no health externality from the neighborhood adoption rate (since the household is protected anyway), but if it does not adopt, then there is a net health externality effect $\gamma_{H}$ from neighborhood use, which makes the overall effect $\alpha_{0}=\gamma_{H}-\gamma_{p}$ and $\alpha_{1}\neq-\alpha_{0}$ in general.151515An analogous asymmetry is also likely in the school voucher example mentioned in the introduction if the voucher-led ‘brain-drain’ leads to utility gains and losses of different amounts, e.g. if better teaching resources in the high-achieving school substitute for – or complement – peer-effects in a way that is not possible in the resource-poor local school. In the context of ITNs, the technological effects are unlikely to be large enough and/or the villagers are unlikely to be sophisticated enough to understand the potential deterrent effects of ITNs. Therefore, we assume from now on that the perceived health externality is non-positive, and thus $\alpha_{1}\geq 0\geq\alpha_{0}$ .

Given the linear index specification, the structural choice probability for alternative $1$ at $\left(p,y,\pi\right)$ is given by

[TABLE]

where $F\left(\cdot\right)$ denotes the marginal distribution function of $-(\eta^{1}-\eta^{0})$ . It is known from Brock and Durlauf (2007) that the structural choice probabilities $F\left(c_{0}+c_{1}p+c_{2}y+\alpha\pi\right)$ identify $c_{0},c_{1},c_{2}$ and $\alpha$ , i.e. $\left(\delta_{1}-\delta_{0}\right)$ , $\beta_{0}$ , $\beta_{1}$ and $\left(\alpha_{1}-\alpha_{0}\right)=2\gamma_{p}-\gamma_{H}$ , up to scale even without knowledge of the probability distribution of $\varepsilon=-(\eta^{1}-\eta^{0})$ . In the application, we will consider various ways to estimate the structural choice probabilities, including standard Logit and Klein and Spady’s distribution-free MLE. One can also use other semiparametric methods, e.g. Bhattacharya (2008) or Han (1987) that require neither specification of error distributions nor subjective bandwidth choice.

The condition $\alpha_{1}\geq 0\geq\alpha_{0}$ makes the model different from standard demand models for binary. In the standard case, for the so-called “outside option”, i.e. not buying, the utility is normalized to zero. In a social spillover setting, this cannot be done because that utility depends on the aggregate purchase rate $\pi$ . As we will see below, in welfare evaluations of a subsidy, $\alpha_{1}$ and $\alpha_{0}$ appear separately in the expressions for welfare-distributions, but cannot be separately identified from demand data, which can only identify $\alpha\equiv\alpha_{1}-\alpha_{0}$ . As a result, point-identification of welfare will in general not be possible. Below, we will consider three untestable special cases, under which one obtain point-identification, viz. (i) $\alpha_{1}=\alpha/2=-\alpha_{0}$ (i.e. $\gamma_{H}=0$ : no health externality and symmetric spillover), (ii) $\alpha_{1}=\alpha,$ $\alpha_{0}=0$ (i.e. $\gamma_{H}=\gamma_{p}$ : technological health externality dominates deflection channel and net health externality exactly offsets conforming effect) and (iii) $\alpha_{1}=0$ , $\alpha_{0}=-\alpha$ ( $\gamma_{p}=0$ and $\gamma_{H}=-\alpha$ : no conforming effect and deflection channel dominates). Cases (ii) and (iii) will yield respectively the upper and lower bounds on welfare gain in the general case.

Toward obtaining the welfare results, consider a hypothetical price intervention moving from a situation where everyone faces a price of $p_{0}$ to one where people with income less than an eligibility-threshold $\tau$ are given the option to buy at the subsidized price $p_{1}<p_{0}$ . This policy will alter the equilibrium take-up rate. Assume that the equilibrium take up rate changes from $\pi_{0}$ to $\pi_{1}$ . We will describe calculation of $\pi_{0}$ and $\pi_{1}$ later. For given values of $\pi_{0}$ and $\pi_{1}$ , the welfare effect of the policy change can be calculated as described below. We first lay out the results in detail for the case where $\pi_{1}>\pi_{0}$ , which corresponds to our application. In the appendix we present results for a hypothetical case where $\pi_{1}<\pi_{0}$ (which may happen if there are multiple equilibria before and after the intervention). For the rest of this section, we assume that $\pi_{1}>\pi_{0}$ .

4.1 Welfare for Eligibles

The compensating variation for a subsidy-eligible household is given by the solution $S$ to

[TABLE]

Since LHS is strictly increasing in $S$ , the condition $S\leq a$ is equivalent to

[TABLE]

If $a<p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)<0$ , then each term on the LHS of (46) is smaller than the corresponding term on the RHS. If $a\geq\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)>0$ , then each term on the LHS is larger than the corresponding term on the RHS. This gives us the support of $S$ :

[TABLE]

Remark 1

Note that the above reasoning also helps establish existence of a solution to (45). We know from above that for $S<p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)$ , the LHS of (45) is strictly smaller than the RHS, and for $S\geq\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , the LHS of (45) is strictly larger than the RHS. By continuity, and the intermediate value theorem, it follows that there must be at least one $S$ where (45) holds with equality.

Back to calculating the CDF, now consider the intermediate case where

[TABLE]

In this case, the first term on LHS of (46) is larger than first term on RHS for all $\eta_{1}$ , and the second term on LHS of (46) is smaller than the second term on the RHS for all $\eta_{0}$ , and thus (69) is equivalent to

[TABLE]

For any given $\alpha_{1}$ , we have that the probability of (47) reduces to

[TABLE]

The intercept $c_{0}$ , the slopes $c_{1},c_{2}$ and $\alpha$ are all identified from conditional choice probabilities; but $\alpha_{1}$ is not identified, and therefore (48) is not point-identified from the structural choice probabilities. However, since $\alpha_{1}\in\left[0,\alpha\right]$ , for each feasible value of $\alpha_{1}\in\left[0,\alpha\right]$ , we can compute a feasible value of (48), giving us bounds on the welfare distribution.

Note also that the thresholds of $a$ at which the CDF expression changes are also not point-identified for the same reason. However, since $\pi_{1}-\pi_{0}>0$ and $\beta_{0}>0$ , $\beta_{1}>0$ , the interval

[TABLE]

will translate to the left as $\alpha_{1}$ varies from [math] to $\alpha$ .

Putting all of this together, we get the following result:

Theorem 3

If Assumptions 1, 2, and the linear index structure hold and $\pi_{1}>\pi_{0}$ , then given $\alpha_{1}\in\left[0,\alpha\right]$ , the distribution of the compensating variation for eligibles is given by

[TABLE]

Remark 2

Note that the above theorem continues to hold even if the subsidy is universal; we have not used the means-tested nature of the subsidy to derive the result.

Mean welfare: From (52), mean welfare loss is given by

[TABLE]

Discussion: The width of the bounds on (52) and (53), obtained by varying $\alpha_{1}$ over $\left[0,\alpha\right]$ , depends on the extent to which $q_{1}\left(\cdot,\cdot,\pi\right)$ is affected by $\pi$ , i.e. the extent of social spillover, and also the difference in the realized values $\pi_{1}$ and $\pi_{0}$ . For our single-index model, the fixed point restrictions imply that these counterfactual $\pi_{1}$ and $\pi_{0}$ depend on $\alpha_{1}$ and $\alpha_{0}$ only via $\alpha=\alpha_{1}-\alpha_{0}$ (c.f. (68) and (69) below) which is point-identified, so every potential value of counterfactual demand is point-identified. But given any feasible value of $\pi_{1}$ and $\pi_{0}$ , the welfare (53) is not point-identified in general since $\alpha_{1}$ is unknown.

Given $\alpha$ , the welfare gain in expression (53) is increasing in $\alpha_{1}$ ; i.e., the welfare gain is largest in absolute value when $\alpha_{1}=\alpha$ and $\alpha_{0}=0$ , and the smallest when $\alpha_{1}=0$ and $\alpha_{0}=-\alpha$ . Conversely for welfare loss. Intuitively, if there is no negative externality from increased $\pi$ on non-purchasers, then they do not suffer any welfare loss, but purchasers have a welfare gain from both lower price and higher $\pi$ . Conversely, if all the spillover is negative, then purchasers still get a welfare gain via price reduction, but non-purchasers suffer welfare loss due to increased $\pi$ . Also, note that under quasilinear utilities, where income effects are absent, the $y$ drops out of the above expressions, but the same identification problem remains, since $\alpha_{1}$ does not disappear. Changing variables $p=p_{1}-a$ , one may rewrite (53) as

[TABLE]

Note that if $\alpha_{1}=0$ , then the first term is the usual consumer surplus capturing the effect of price reduction on consumer welfare; for a positive $\alpha_{1}$ , the term $\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)$ yields the additional effect arising via the conforming channel. Also, if $\alpha_{1}=0$ , then the second term, i.e. the welfare loss from not buying, is the largest (given $\alpha$ ): this corresponds to the case where all of $\alpha$ is due to the negative externality.

The second term in (54), which represents welfare change caused solely via spillover and no price change, is still expressed as an integral with respect to price. This is a consequence of the index structure which enables us to express this welfare loss in terms of foregone utility from an equivalent price change. To see this, recall eq. (45)

[TABLE]

which is equivalent to

[TABLE]

which is of the form

[TABLE]

i.e.

[TABLE]

From Bhattacharya, 2015, this is exactly the form for the compensating variation $S^{\prime}$ in a binary choice model without spillover when income is $y^{\prime}$ and price changes from $p_{0}^{\prime}$ to $p_{1}^{\prime}$ .161616Analogously, the choice probabilities have the form

$q_{1}\left(p,y,\pi\right)=F\left(c_{0}+c_{1}p+c_{2}y+\alpha\pi\right)=F\left(c_{0}+c_{1}\left(p+\frac{\alpha}{c_{1}}\pi\right)+c_{2}y\right)\equiv\bar{q}_{1}\left(p+\frac{\alpha}{c_{1}}\pi,y\right)\text{,}$

i.e. the choice probabilities under spillover at price $p,$ income $y$ and aggregate use $\pi$ can be expressed as choice-probabilities in a binary choice model with no spillover at an adjusted price and the same income.

Corollary 1

In the special case of symmetric interactions, i.e. where $\alpha_{1}=-\alpha_{0}$ in (26) (e.g. if $\gamma_{H}=0$ , i.e. there is no health externality in the health-good example), we get that $\frac{\alpha_{1}}{\alpha}=\frac{-\alpha_{0}}{-2\alpha_{0}}=\frac{1}{2}$ , and from (54) mean welfare equals:

[TABLE]

If $\alpha_{0}=0$ , and $\alpha=\alpha_{1}$ , i.e. all spillover is via conforming, average welfare is given by

[TABLE]

if on the other hand, all spillover is due to perceived health risk, i.e. $\alpha=-\alpha_{0}$ and $\alpha_{1}=0$ , then average welfare is given by

[TABLE]

Equations (56) and (57) correspond to the upper and lower bounds, respectively, of the overall welfare gain for eligibles.171717In independent work, Gautam (2018) obtained apparently point-identified estimates of welfare in parametric discrete choice models with social interactions, using Dagsvik and Karlstrom (2005)’s expressions for the setting without spillover. Even with strong restrictions, under which welfare is point-identified, our welfare expressions (c.f. eqn (55), (56), (57)) are different from Gautam’s.

4.2 Welfare for Ineligibles

Welfare for ineligibles is defined as the solution $S$ to the equation

[TABLE]

Using the index-structure, $S\leq a$ is therefore equivalent to

[TABLE]

If $a<\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{0}-\pi_{1}\right)<0$ , then each term on the LHS is smaller than the corresponding term on the RHS for each realization of the $\eta$ s. So the probability is 0. Similarly, for $a\geq\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)>0$ , each term on the LHS is larger, and thus the probability is 1. In the intermediate range, $a\in[\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{0}-\pi_{1}\right),\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right))$ , we have that the first term on the LHS exceeds the first term on the RHS for each $\eta_{1}$ , and the second term on the LHS is smaller than the second term on the RHS for each $\eta^{0}$ . Therefore, (58) is equivalent to

[TABLE]

The probability of this event is not point-identified if the values of $\alpha_{1}$ , $\alpha_{0}$ are not known. But for each choice of $\alpha_{1}\in\left[0,\alpha\right]$ , we can compute the probability of this event as

[TABLE]

Putting all of this together, we have the following result:

Theorem 4

If Assumptions 1, 2, and the linear index structure hold and $\pi_{1}>\pi_{0}$ , then for each $\alpha_{1}\in\left[0,\alpha\right]$ ,

[TABLE]

For ineligibles, all of the welfare effects come from spillovers, since they experience no price change. In particular, for ineligibles who buy, there is a welfare gain from positive spillover due to a higher $\pi$ . For ineligibles who do not buy, there is, however, a potential welfare loss due to increased $\pi$ . This is why the CV distribution has a support that includes both positive and negative values. From (62), mean compensating variation is given by

[TABLE]

Using the change of variables, $p=p_{0}-a$ , the above expression becomes

[TABLE]

The first term in (64) captures the welfare gain resulting from a positive $\alpha_{1}$ and higher $\pi$ ; this term would be zero if $\alpha_{1}=0$ . The second term in (64) captures the welfare loss also resulting from higher $\pi$ ; this loss would be zero if there are no negative impacts, i.e. $\alpha_{0}=0$ . Of course, both would be zero if $\alpha=0=\alpha_{1}=\alpha_{0}$ , reflecting the fact that welfare effect on ineligibles would be zero if there is no spillover.

Corollary 2

In the three special cases where we have point-identification, viz. (i) $\alpha_{1}=-\alpha_{0}=\frac{\alpha}{2}$ ; (ii) $\alpha=\alpha_{1}$ , $\alpha_{0}=0$ ; and (iii) $\alpha=-\alpha_{0}$ , $\alpha_{1}=0$ , mean CV (64) reduces respectively to:

[TABLE]

Equations (66) and (67) correspond to the upper and lower bounds, respectively, of the overall welfare gain for ineligibles, and therefore, the overall bounds generically contain both positive and negative values, since $\alpha\neq 0$ .

4.3 Deadweight Loss

The average deadweight loss (DWL) can be calculated as the expected subsidy spending less the net welfare gain. In particular, if $\alpha_{0}=0$ and $\alpha=\alpha_{1}$ , i.e. there are no negative spillover, then from (54) and (63), the DWL equals

[TABLE]

So if $\frac{\alpha}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)$ is large enough, then it is possible for the deadweight loss to be negative, i.e. for the subsidy to increase economic efficiency under positive spillover, as in the standard textbook case. This can happen because there is no subsidy expenditure on ineligibles, and yet those that buy enjoy a subsidy-induced welfare gain due to positive spillover. Similarly, eligibles also receive an additional welfare gain via positive spillover, over and above the welfare-gain due to reduced price, and it is only the latter that is financed by the subsidy expenditure. In general, the deadweight loss will be lower (more negative) when (i) the positive spillover $\left(\alpha_{1}\right)$ is larger, (ii) the change in equilibrium adoption $\left(\pi_{1}-\pi_{0}\right)$ due to the subsidy is greater, and (iii) the price elasticity of demand $\left(-\beta_{1}\right)$ is lower – the last effect lowers deadweight loss simply by reducing the substitution effect, even in absence of spillover.

4.4 Calculation of Predicted Demand and

Welfare

In order to calculate our welfare-related quantities, we need to estimate the structural choice probabilities $q_{1}\left(p,y,\pi\right)$ and the equilibrium values of the aggregate choice probabilities, $\pi_{0}$ and $\pi_{1}$ in the pre and post intervention situations. To do this we will consider two alternative scenarios. The first is where we assume that the unobservables $\boldsymbol{\eta}=\boldsymbol{\eta}_{vh}$ are independent of realized values of price and income (conditional on other covariates) in the available, experimental data. The second is where we assume that exogeneity holds, conditional on unobserved village-fixed effects. Note that price in our data are randomly assigned, so the endogeneity concern is solely regarding income. Under income endogeneity, Bhattacharya (2018) had discussed interpretation of welfare distributions as conditional on income. See Appendix A.6 of the present paper for a review of that discussion. Regardless, calculation of the equilibrium $\pi$ s requires us to either assume exogeneity of observables or to estimate village-fixed effects, conditional on which exogeneity holds, as in our assumptions above.

No Village-Fixed Effects: Under the index-restriction (26) and no village-fixed effects, estimation of $q_{1}\left(p,y,\pi\right)$ can be done via standard binary regression, using the variation in price and income across and within villages and of observed $\pi$ across villages to estimate the coefficients constituting the linear index. This implicitly assumes, as is standard in the literature, that even if the game can potentially have multiple equilibrium $\pi$ ’s, only a single equilibrium is played in each village, and thus one can use the observed $\pi$ from each village as a regressor to infer the preference parameters. Note that given the index structure, we do not need to impose a specific distribution for the $\eta$ s to calculate the index coefficients. Any existing semiparametric estimation method for index models can be used for calculations, e.g. Klein and Spady (1993), which requires bandwidth choice and Bhattacharya (2008), which does not.

Finally, the equilibrium values of $\pi_{0}$ and $\pi_{1}$ can be calculated in each village by solving the fixed point problems

[TABLE]

where $F_{Y}\left(\cdot\right)$ denotes the distribution of income in the village. For fixed $p_{0},p_{1}$ , the RHS of the above equations, viewed as functions of $\pi_{0}$ and $\pi_{1}$ respectively, are each a map from $\left[0,1\right]$ to $\left[0,1\right]$ . If $q_{1}\left(p_{1},\cdot,y\right)$ and $q_{1}\left(p_{0},\cdot,y\right)$ are continuous, then by Bruower’s fixed point theorem, there is at least one solution in $\pi_{0}$ and $\pi_{1}$ , respectively, implying ”coherence”. However, there may be multiple solutions, and then our welfare expressions would have to be applied separately for each feasible pair of values $\left(\pi_{0},\pi_{1}\right)$ . Note that even if the solutions to (68) and (69) are unique, our expressions in theorems 3 and 4 above imply that welfare distributions are still not point-identified.

Once we obtain the predicted values of $\pi_{0}$ and $\pi_{1}$ , we can calculate (52) and (62) directly, using previously obtained estimates of the index coefficients.

With Village-Fixed Effects: Our data for the application come from eleven different villages with approximately 180 households per village. It is plausible that utilities from using and from not using a bednet are affected by village-specific unobservable characteristics, such as the chance of contracting malaria when not using a bednet. Such effects were termed “contextual” by Manski (1993). Brock and Durlauf (2007) discussed some difficulties with estimating social spillover effects in presence of group-specific unobservables. To capture this situation explicitly, recall the linear utility structure from Section 2, given by

[TABLE]

where $\xi^{0}$ and $\xi^{1}$ denote unobservable village specific characteristics. Therefore,

[TABLE]

Since $\xi$ is village specific and we have many observations per village, we can use a dummy $\gamma_{v}$ for each village, and estimate the regression of take-up on price, income and other characteristics that vary across households $h$ within village $v$ , together with village dummies, i.e.

[TABLE]

where $F_{\varepsilon}\left(\cdot\right)$ refers to the distribution of $\varepsilon=\varepsilon_{vh}$ (which may potentially depend on the realized value $\xi_{v}$ for village $v$ ). The consistency of these estimates results from exogeneity conditional on village-fixed effects (See assumptions C3-IID (ii) and C3-SD (ii) above).The identified coefficients $\gamma_{v}$ of the village dummies therefore satisfy $\gamma_{v}=\alpha\pi_{v}+c_{0}+\xi_{v}$ . We will need to identify the sum $\bar{\xi}_{v}\equiv c_{0}+\xi_{v}$ . However, in the equations $\gamma_{v}=\alpha\pi_{v}+\bar{\xi}_{v}$ there are as many $\bar{\xi}_{v}$ as there are $\gamma_{v}$ , so we have $\bar{v}$ equations in $\bar{v}+1$ unknowns ( $\bar{\xi}_{v}$ s and $\alpha$ ). In our empirical application, we address this issue in two separate ways. The first is a homogeneity assumption for observationally similar villages, and the second is Chamberlain’s correlated random effects approach.

Homogeneity Assumption: If two villages are very similar in terms of observables, then it is reasonable to assume that they have similar values of $\bar{\xi}_{v}$ , which leads to a dimension reduction, and enables point-identification simply by solving the linear system $\gamma_{v}=\alpha\pi_{v}+\bar{\xi}_{v}$ as there are as many $\bar{\xi}_{v}$ s as the number of $\gamma_{v}$ less $1$ (for $\alpha$ ). Indeed, in our application, there are two villages out of eleven in our dataset that are very similar in terms of observables, and hence are amenable to this approach.

Correlated Random Effects Assumption: A different way to address the unobserved group-effect issue is to use Chamberlain’s correlated random effects approach (c.f. Section 15.8.2 of Wooldridge, 2010). In this approach, one models the unobserved $\bar{\xi}_{v}=\bar{Z}_{v}^{\prime}\bar{\delta}+e_{v}$ where $\bar{Z}_{v}$ denotes the village-averages of observables, and the error term $e_{v}$ is assumed to satisfy $e_{v}\perp\varepsilon_{vh}|(W_{vh},\bar{Z}_{v})$ ( $\varepsilon_{vh}=u_{vh}^{1}-u_{vh}^{0}$ ). The coefficients $\bar{\delta}$ are estimated in an initial probit regression of purchase on individual and village characteristics

In the absence of the above assumptions, $\alpha$ can be point-identified using an instrumental variable type strategy if there are many villages, e.g. estimate the ‘regression’ $\gamma_{v}=\alpha\pi_{v}+\bar{\xi}_{v}$ using, say the aggregate fraction of individuals with subsidies or the average value of subsidy as the IV for $\pi_{v}$ . But since we have only eleven villages in our data, we do not consider this avenue.

Welfare Calculation with Village-Fixed Effects: Once we have a plausible way to estimate the structural choice probabilities, we can proceed with welfare calculation in presence of social spillover and unobserved group-effects, as follows. Consider an initial situation where everyone faces the unsubsidized price $p_{0}$ , so that the predicted take-up rate $\pi_{0}=\pi_{0v}$ in village $v$ solves

[TABLE]

where $F_{Y}^{v}\left(y\right)$ is the distribution of income $Y_{vh}$ in village $v$ , and $c_{1}$ , $c_{2}$ , $\alpha$ , and $\bar{\xi}_{v}$ are estimated as above. Now consider a policy induced price regime $p_{0}$ for ineligibles (wealth larger than $a$ ) and $p_{1}$ for eligibles (wealth less than $a$ ). Then the resulting usage $\pi_{1}=\pi_{1v}$ in village $v$ is obtained via solving the fixed point $\pi_{1v}$ in the equation

[TABLE]

Finally, average welfare effect of this policy change in village $v$ can be calculated using

[TABLE]

where $\mathcal{W}_{v}^{\mathrm{Elig}}\left(y\right)$ and $\mathcal{W}_{v}^{\mathrm{Inelig}}(y)$ are average welfare at income $y$ in village $v$ , calculated from (52) for eligibles and (62) for ineligibles, respectively, using $\pi_{0v}$ and $\pi_{1v}$ as the predicted take-up probability in village $v$ (analogous to $\pi_{0}$ and $\pi_{1}$ in (52) and (62)), $\alpha_{1}\in\left[0,\alpha\right]$ as above.

5 Empirical Context and Data

Our empirical application concerns the provision of anti-malarial bednets. Malaria is a life-threatening parasitic disease transmitted from human to human through mosquitoes. In 2016, an estimated 216 million cases of malaria occurred worldwide, with 90% of the cases in sub-Saharan Africa (WHO, 2017). The main tool for malaria control in sub-Sahran Africa is the use of insecticide treated bednets. Regular use of a bednet reduces overall child mortality by around 18 percent and reduces morbidity for the entire population (Lengeler, 2004). However, at $6 or more a piece, bednets are unaffordable for many households, and to palliate the very low coverage levels observed in the mid-2000s, public subsidy schemes were introduced in numerous countries in the last 10 years. Our empirical exercise is designed to evaluate such subsidy schemes not just in respect of their effectiveness in promoting bednet adoption, but also their impact on individual welfare and deadweight loss, in line with classic economic theory of public finance and taxation. Based on our discussion in Section 4, we focus on two main sources of spillover, viz. (a) a preference for conformity, and (b) a concern that mosquitoes will be deflected to oneself when neighbors protect themselves. Both will generate a positive effect of the aggregate adoption rate on one’s own adoption decision, but they have different implications for the welfare impact of a price subsidy policy.

**Experimental Design: **We exploit data from a 2007 randomized bednet subsidy experiment conducted in eleven villages of Western Kenya, where malaria is transmitted year-round. In each village, a list of $150$ to $200$ households was compiled from school registers, and households on the list were randomly assigned to a subsidy level. After the random assignment had been performed in office, trained enumerators visited each sampled household to administer a baseline survey. At the end of the interview, the household was given a voucher for an bednet at the randomly assigned subsidy level. The subsidy level varied from $40$ % to $100$ % in two villages, and from $40$ % to $90$ % in the remaining $9$ villages; there were $22$ corresponding final prices faced by households, ranging from [math] to $300$ Ksh (US $$5.50$). Vouchers could be redeemed within three months at participating local retailers.

**Data: **We use data on bednet adoption as observed from coupon redemption and verified obtained through a follow-up survey. We also use data on baseline household characteristics measured during the baseline survey. The three main baseline characteristics we consider are wealth (the combined value of all durable and animal assets owned by the household); the number of children under 10 years old; and the education level of the female head of household.181818Not all households in a village participated in the game. However, at the time of the experiment, non-selected households did not have the opportunity to buy an ITN, and the outcome variables for such households are always zero. So even if we allow for interactions among all households (including non-selected ones), it is easy to make the necessary adjustments in the empirics. See Appendix A.7 for more on this.

6 Empirical Specification and

Results

We work with the linear index structure (26), where $y=Y_{vh}$ is taken to be the household wealth, $p=P_{vh}$ is the experimentally set price faced by the household, $\pi=\Pi_{vh}$ is the average adoption in the village. The health externality from bednet use is implicitly accounted for via the dependence of utilities from adoption and non-adoption on the average adoption rate $\pi$ (c.f. eq. (26)).191919There are some households who live in the village but were not part of the formal experiment. Since the ITN was not available from any source other than via the experiment, this only impacts the game via the computed fraction $\Pi_{vh}$ . We clarify this point in Appendix A.7.

For the empirical analysis, we also use additional controls, denoted by $Z_{vh}$ below, that can potentially affect preferences ( $U_{1}\left(\cdot\right)$ and $U_{0}\left(\cdot\right)$ ) and therefore the take-up of bednet, i.e. $q_{1}\left(\cdot\right)$ . In particular, we include presence of children under the age of ten and years of education of the oldest female member of the household. A village-specific variable that could affect adoption is the extent of malaria exposure risk in the village. We measure this in our data from the response to the question: ”Did anyone in your household have malaria in the past month?”. Summary statistics for all relevant variables are reported in Table 1, and their village averages are shown in table 2, for each of the eleven villages in the data.

Our first of results correspond to taking $F\left(\cdot\right)$ to be the standard logit CDF of $\eta_{vh}=-(\eta_{vh}^{1}-\eta_{vh}^{0})$ (as in (44), i.e. with no fixed effects), and including average take-up $\pi=\hat{\pi}_{v}(=\frac{1}{N_{vh}}\sum_{h=1}^{N_{v}}A_{vh})$ in village as a regressor.202020While estimating the logit parameters we do not impose the fixed point constraint. While this would have improved efficiency, the additional computational burden would be quite onerous. As shown in Theorem 2 above, even if unobservables are spatially correlated, our increasing domain asymptotic approximation will lead to consistent estimates of preference parameters. This approximation is reasonable in our empirical setting where the average distance between households within a village typically exceeds 1.5 Kilometers. The marginal effects at mean are presented in Table 3. It is evident that demand is highly price elastic, and that average bednet adoption in the village has a significant positive association with private adoption, conditional on price and other household characteristics, i.e. $\alpha>0$ in our notation above. The social interaction coefficient $\alpha$ is $2.4$ which is less than $4$ , as required for the fixed point map to be a contraction (see discussion following Proposition 2) in the logit case. The effect of children is negative, likely reflecting that households with children had already invested in other anti-malarial steps, e.g. had bought a less effective traditional bednet prior to the experiment. We also computed analogous estimates where we ignore the spillover, i.e., we drop average take-up in village from the list of regressors. The corresponding marginal effects for the retained regressors are not very different in magnitude from those obtained when including the average village take-up, and so we do not report those here. Instead, we use the two sets of coefficients to calculate and contrast the predicted bednet adoption rate corresponding to different eligibility thresholds. These predicted effects are quite different depending on whether or not we allow for spillover, and so we investigated these further, as follows.

In particular, we consider a hypothetical subsidy rule, where those with wealth less than $\tau$ are eligible to get the bednet for $50$ KSh ( $90$ % subsidy), whereas those with wealth larger than $\tau$ get it for the price of $250$ KSh ( $50$ % subsidy). Based on our logit coefficients, we plot the predicted aggregate take-up of bednets corresponding to different income thresholds $\tau$ . In Figure 1, for each threshold $\tau$ , we plot the fraction of households eligible for subsidy on the horizontal axis, and the predicted fraction choosing the bednet on the vertical axis, based on coefficients obtained by including (solid) and excluding (small dash) the spillover effect. The 45 degree line (large dash) showing the fraction eligible for the subsidy is also plotted in the same figure for comparison.

It is evident from Figure 1 that ignoring spillovers leads to over-estimation of adoption at lower thresholds and underestimation at higher thresholds of eligibility. To get some intuition behind this finding, consider a much simpler set-up where an outcome $Y$ is related to a scalar covariate $X$ via the classical linear regression model $Y=\beta_{0}+\beta_{1}X+\epsilon$ where $\epsilon$ is zero-mean, independent of $X$ and $\beta_{1}>0$ . OLS estimation of this model yields estimators $\hat{\beta}_{1}$ , $\hat{\beta}_{0}$ with probability limits (and also expected values) $\beta_{1}=\mathrm{Cov}\left[X,Y\right]/\mathrm{Var}\left[X\right]$ and $\beta_{0}=E\left[Y\right]-\beta_{1}E\left[X\right]$ , respectively. Corresponding to a value $x$ of $X$ , the predicted outcome has a probability limit of $y^{\ast}:=\beta_{0}+\beta_{1}x=E\left[Y\right]+\beta_{1}\left\{x-E\left[X\right]\right\}$ . Now consider what happens if one ignores the covariate $X$ . Then the prediction is simply the sample mean of $Y$ which has the probability limit of $y^{\mathrm{miss}}:=E\left[Y\right]$ . Therefore, $y^{\ast}<y^{\mathrm{miss}}$ if $x<E\left[X\right]$ . Thus, although the ignored covariate $X$ has a positive effect on the outcome (since $\beta_{1}>0$ ), ignoring it in prediction leads to an overestimation of the outcome if the point $x$ where the prediction is made is smaller than the population average of the ignored covariate. On the other hand, if $x>E\left[X\right]$ , then there will be under-estimation.

Having obtained these (uncompensated) effects, we now turn to calculating the average demand and the mean compensating variation for a hypothetical subsidy scheme. We consider an initial situation where everyone faces a price of $250$ KSh for the bednet, and a final situation where an bednet is offered for $50$ KSh to households with wealth less than $\tau=8000$ KSh (about the $27$ th percentile of the wealth distribution), and for the price of $250$ KSh to those with wealth above that. The demand results are reported in Table 4, and the welfare results in Table 5. We perform these calculations village-by-village, and then aggregate across villages. To calculate these numbers, we first predict the bednet adoption when everyone is facing a price of $250$ KSh, and then when eligibles face a price of $50$ KSh and the rest stay at $250$ KSh, giving us the equilibrium values of $\pi_{0}$ and $\pi_{1}$ , respectively, in our notation above. In all such calculations with our data, we always detected a single solution to the fixed point $\pi$ (i.e. a unique equilibrium) as can be seen from Figure 2, where we plot the squared difference between the RHS and the LHS of eqn. (69), i.e.

[TABLE]

on the vertical axis, and $\pi_{1}$ on the horizontal axis, separately for each of the eleven villages, where $\hat{q}_{1}\left(p,y,z,\pi\right)$ is the predicted demand (choice probability) function at $\left(p,y,z,\pi\right)$ . The globally convex nature of each objective function is evident from Figure 1. The minima are relatively close to each other around $0.15$ , except village 7 and 10, where it is larger. A similar set of globally convex graphs is obtained for $\pi_{0}$ , which minimizes $\left[\pi_{0}-\int\hat{q}_{1}\left(p_{1},y,z,\pi_{0}\right)d\hat{F}_{Y,Z}\left(y,z\right)\right]^{2}$ . These predicted values of $\pi_{0}$ and $\pi_{1}$ are used as inputs into the prediction of demand as per eqn. (41) and welfare as per Theorems 3 and 4.

The first row of Table 4 shows the pre-subsidy predicted demand (using a logit CDF $F$ ) by subsidy-eligibility. In the second row, we calculate the predicted effect of the subsidy on demand, and break that up by the own price effect (Row 2) and the spillover effect (row 3). The own effect is obtained by changing the price in accordance with the subsidy but keeping the average village demand equal to the pre-subsidy value; the spillover effect is the difference between the overall effect and the own effect. It is clear that spillover effects on both eligibles and ineligibles are large in magnitude. In particular, the spillover effect raises demand for ineligibles by nearly 33% of its pre-subsidy level.

In Table 5, we report welfare calculations. First, in the row titled ”Logit”, we report the average CV of the subsidy rule for eligibles, corresponding to assuming no spillover. In this case, we simply use the results of Bhattacharya (2015) to calculate the (point-identified) average CV for eligibles as the price changes from $250$ KSh to $50$ KSh. This yields the value of welfare gain to be $51.9$ KSh. As there is no spillover, the welfare change of ineligibles is zero by definition, and therefore the net welfare gain, denoted by net CV is simply the fraction eligible ( $0.27$ ) times the average CV for eligibles. This is reported in the second column of Table 5.

We next turn to the case with spillover. Using the predicted adoption rates $\pi_{0}$ and $\pi_{1}$ , we compute the lower and upper bounds of the overall average CV using (54), (56) and (57) for eligibles, and using (64), (66) and (67) for ineligibles. These are reported in Columns 3-6 of Table 5. The most conspicuous finding from these numbers is that ineligibles can suffer a large welfare loss on average due to the subsidy. This is because the subsidy facilitates usage for solely the eligibles, raising the equilibrium usage $\pi$ in the village, but the ineligibles keep facing the high price, and thus a lower utility from not buying because $\pi$ is now higher (in the index specification, $\alpha_{0}\leq 0$ ). However, the few ineligibles who buy, despite the high price, get some welfare increase from a rise in the average adoption rate, that explains the small upper bound corresponding to the case $\alpha_{0}=0$ . As for eligibles, the lower and upper bounds on average welfare gain do not contain the estimate that ignores spillovers, suggesting over-estimation of welfare gains in the latter case. This is also consistent with Figure 1, where we see that at $27$ % eligibility and lower, demand is overestimated when spillovers are ignored. The overall welfare gain across eligibles and ineligibles, reported in the column with heading “net CV”, includes the negative welfare effects on ineligibles, thereby lowering the average effect relative to ignoring spillovers and incorrectly concluding no welfare change for ineligibles.

Deadweight Loss: To compute the average deadweight loss, we subtract the net welfare from the predicted subsidy expenditure. The latter equals the amount of subsidy ( $200$ KSh) times the average demand at the subsidized price 50 KSh of the eligibles. Thus the expression for DWL is given by

[TABLE]

where $y$ denotes wealth, $z$ denotes other covariates, $q_{1}\left(50,y,z,\pi_{1}\right)$ denotes predicted demand at price $50$ KSh including the effect of spillover, and $\mu^{\mathrm{Elig}}$ and $\mu^{\mathrm{Inelig}}$ refer to average welfare gain for eligibles and ineligibles, respectively. Ignoring spillovers leads to the point-identified deadweight loss

[TABLE]

Group-Effects: It is evident from table 2 that villages 1 and 11 are highly similar in terms of the average values of key regressors, except that the (randomly assigned) average price in village 1 is much higher than in village 11, which explains the much lower average adoption in village 11. Given this, we assume that villages 1 and 11 are likely to be similar in terms of their unobservables, and as such, we estimate a single $\bar{\xi}_{v}$ for them. Specifically, we first estimate

[TABLE]

where $Z_{vh}$ is a vector containing presence of children and female education, the $\gamma_{v}$ s are village-specific intercepts (estimated using dummies for the villages), and $P_{vh}$ and $Y_{vh}$ are price faced by the household in the experiment and its wealth, respectively. In the second step, we solve the linear system $\gamma_{v}=\alpha\pi_{v}+c_{0}+\xi_{v}=\alpha\pi_{v}+\bar{\xi}_{v}$ , for $\alpha$ and $\bar{\xi}_{v}$ , for $v=1,...,11$ , where $\gamma_{v}$ is obtained in the previous step, and the $\pi_{v}$ s are the average adoption rates in individual villages in the experiment. In solving this system, we set $\bar{\xi}_{1}=\bar{\xi}_{11}$ , which incorporates the homogeneity assumption discussed above. We can do all of this in one step by adding nine dummies for villages 2-10 and one for villages 1 and 11, and then running a regression of individual use on the regressors $p,y$ and $x$ , the average use in each village, as well as the village dummies. In the second row in Table 5, we report the average welfare effects of the same hypothetical policy change as described above, using expression (72).

Next, we use the correlated random effect approach described above, where village averages of observable regressors (price, wealth, female education, number of children) are added as additional controls in a probit (instead of logit) regression. The corresponding welfare results are reported in the third row of table 5.

Semiparametric Estimates: Finally, in the fourth row of Table 5, we report welfare results from a semiparametric index estimation of the conditional choice-probabilities, i.e. retaining the index structure but dropping the logit assumption. This is achieved by using the “sml” routine (de Luca, 2008) in Stata which implements Klein and Spady’s (1993) estimator for single index models, using (i) a default bandwidth of $h_{n}=n^{-1/6.5}$ to estimate the index, and then (ii) a local cubic polynomial for regressing the binary outcome on the estimated index to produce the predicted probabilities, using a bandwidth of $h_{n}=cn^{-1/5}$ where $c$ is chosen via leave-one-out cross-validation.

The welfare numbers do vary a bit across specifications. But all of these results support the overall conclusion that accounting for spillovers can lead to much lower estimates of net welfare gain from the subsidy program and higher deadweight loss. Some of this difference arises from potential welfare loss suffered by ineligibles that is missed upon assuming no spillover, and some from the impact of including spillover terms on the prediction of counterfactual purchase-rates (c.f. Fig 1).

In Table 6, we report standard errors for the simple logit case. In principle, one can also derive formulae for standard errors adjusted for spatial correlation, but given that the paper is already quite long, and such standard errors contribute nothing substantive, we do not attempt that here. Table 6 also reports the welfare calculations corresponding to the special case where $\alpha_{1}=-\alpha_{0}=\alpha/2$ . This would be reasonable when there is no negative externality due to deflection, i.e. $\gamma_{H}=0$ above, whence average welfare becomes point-identified. Note that this case is different from the results obtained assuming no spillover whatsoever, i.e. the first row third column of table 5. We still obtain a negative average effect of the subsidy due to the larger aggregate welfare loss of ineligibles compared with the gains of eligibles.

Comparative Statics: In Table 7, we show how the welfare effects change as we vary the generosity of the subsidy scheme; the wealth threshold for qualification is varied so that either 20%, 40% or 60% of the population is eligible. It is apparent from Table 7 that the upper bound on welfare loss for ineligibles increases as more people become eligible (since equilibrium take up is higher), and the deadweight loss larger still due to both a larger extent of subsidy induced distortion, as well as the higher welfare loss of ineligibles. The lower bound on the welfare gain for eligibles decreases as the share eligible increases, in fact it becomes negative when 40% are eligible. This is because those among the eligible who are too poor to buy the bednet even at the 50Ksh price are now experiencing a welfare loss since equilibrium take-up is higher. The overall effect is an unambiguous increase in the deadweight loss.

Endogeneity: Price variation is exogenous in our application, since price was varied randomly by the experimenter. Indeed, it is still possible that wealth $Y$ is correlated with $\eta$ , the unobserved determinants of bednet purchase. However, experimental variation in price $P$ implies also that $P$ is independent of $\eta$ , given $Y$ . Consequently, one can invoke the argument presented in Bhattacharya (2018, Sec. 3.1; reproduced in the Appendix A.6 below for ease of reference), and interpret the estimated choice-probabilities and the corresponding welfare numbers as conditional on $y$ , and then integrating with respect to the marginal distribution of $y$ . This overcomes the problem posed by potentially endogenous income.

7 Summary and Conclusion

In this paper, we develop tools for economic demand and welfare analysis in binary choice models with social interactions. To do this, we first show the connection between Brock-Durlauf type social interaction models and empirical games of incomplete information with many players. We analyze these models under both I.I.D. and spatially correlated unobservables. The latter makes individual beliefs conditional on privately observed variables, complicating identification and inference. We show when and how these complications can be overcome via the use of a limit model to which the finite game model converges under increasing domain spatial asymptotics, in turn yielding computationally simple estimators of preference parameters. These lead to consistent point-estimates of potential values of counterfactual demand resulting from a policy-intervention, which are unique under unique equilibria.

However, with interactions, welfare distributions resulting from policy changes such as a price subsidy are generically not point-identified for given values of counterfactual aggregate demand, unlike the case without spillovers. This is true even for fully parametric specifications, and when equilibria are unique. Non-identification results from the inability of standard choice data to distinguish between different underlying latent mechanisms, e.g. conforming motives, consumer learning, negative externalities etc., which produce the same aggregate social interaction coefficient, but have different welfare implications depending on which mechanism dominates. This feature is endemic to many practical settings that economists study, including the health-product adoption case examined here. Another prominent example is school-choice, where merit-based vouchers to attend a fee-paying selective school can create negative externalities by lowering the academic quality of the free local school via increased departure of high-achieving students. The resulting welfare implications cannot be calculated based solely on a Brock-Durlauf style empirical model of individual school-choice inclusive of a social interaction term. This is in contrast to models without social interaction, where choice probability functions have been shown to contain all the information required for welfare-analysis. Nonetheless, we show that under standard semiparametric linear index restrictions, welfare distributions can be bounded. Under some special and untestable cases e.g. exactly symmetric spillover effects or absence of negative externalities, these bounds shrink to point-identified values.

We apply our methods to an empirical setting of adoption of anti-malarial bednets, using data from an experiment by Dupas (2014) in rural Kenya. We find that accounting for spillovers provides different predictions for demand and welfare resulting from hypothetical, means-tested subsidy rules. In particular, with positive interaction effects, predicted demand when including spillover is lower for less generous eligibility criteria, compared to demand predicted by ignoring spillovers. At more generous eligibility thresholds, the conclusion reverses. As for welfare, if negative health externalities are present, then subsidy-ineligibles can suffer welfare loss due to increased use by subsidized buyers in the neighborhood; if solely conforming effects are present and there is no health-related externality, then welfare can improve. Specifically, our welfare bounds applied to the bednet data show that a $200$ KSh subsidy with eligibility threshold equal to the 75th percentile of wealth has an average (across eligibles and ineligibles combined) cash equivalent of between $-14$ to $+10$ KSh when including spillovers; equals $-1.48$ KSh under symmetric spillover, and about $13$ KSh when all spillovers are ignored. The potential welfare loss of ineligibles and non-buyers translates into larger estimates of potential deadweight loss from price intervention. We perform robustness checks allowing for village-level unobservables and a semiparametric specification.

The implication of these results for applied work is that under social interactions, welfare analysis of potential interventions requires more information regarding individual channels of spillover than knowledge of solely the choice probability functions (inclusive of a social interaction term). Belief-eliciting surveys provide a potential solution.

We conclude by noting that we have used the basic and most popular specification of interactions, viz. that physical neighbors constitute an individual’s peer group. This also seems reasonable in the context of our application, which concerns adoption of a health product in physically separated Kenyan villages. It would be interesting to extend our analysis to other network structures, e.g. those based on ethnicity, caste, socioeconomics distance, etc. We leave that to future work.

Appendix A Appendix

This Appendix has seven sections labelled A.1 - A.7. They deal respectively with the proof of constancy and symmetry of the beliefs with I.I.D. unobservables, belief convergence with spatially correlated unobservables, sufficient conditions for contraction, convergence of the estimators (the proof of Theorem 2), welfare analysis under $\pi_{1}<\pi_{0}$ , income endogeneity, and nonparticipating households.

A.1 Proofs for the (Conditionally) I.I.D. Case

Proof of Proposition 1. By the definition in (2) (with $h$ replaced by $k$ ), $\Pi_{vk}=\tfrac{1}{N_{v}-1}{\textstyle\sum\nolimits_{1\leq j\leq N_{v}\text{; }j\neq k}}E[A_{vj}|\mathcal{I}_{vk}]$ . Since this is the average of the conditional expectations given $\mathcal{I}_{vk}=(W_{vk},L_{vk},\boldsymbol{u}_{vk},\boldsymbol{\xi}_{v})$ , we can write $(v,k)$ ’s belief as

[TABLE]

using some function $g_{vk}(\cdot)$ which may depend on each index $\left(v,k\right)$ but is deterministic (non-random). Thus, plugging this expression of $\Pi_{vk}$ into $A_{vk}=1\{U_{1}(Y_{vk}-P_{vk},\Pi_{vk},\boldsymbol{\eta}_{vk})\geq U_{0}(Y_{vk},\Pi_{vk},\boldsymbol{\eta}_{vk})\}$ , we can also write

[TABLE]

for some deterministic function $f_{vk}(\cdot)$ , where $W_{vk}=(Y_{vk},P_{vk})$ .

By C3-IID, we have the two of the conditional independence restrictions: $\left(\boldsymbol{u}_{vh},\boldsymbol{u}_{vk}\right)\perp(W_{vh},L_{vh})|\boldsymbol{\xi}_{v}$ and $\boldsymbol{u}_{vh}\perp\boldsymbol{u}_{vk}|\boldsymbol{\xi}_{v}$ . These imply that

[TABLE]

where we have used the following conditional independence relation: for random objects $Q$ , $R$ , and $S$ ,

[TABLE]

which is applied with $Q=\boldsymbol{u}_{vk}$ , $R=(W_{vh},L_{vh})$ , and $S=\boldsymbol{u}_{vh}$ . By the same token, **C3-IID **implies that

[TABLE]

which is equivalent to

[TABLE]

We below denote by $E_{\boldsymbol{\xi}_{v}}\left[\cdot\right]$ the conditional expectation operator given $\boldsymbol{\xi}_{v}$ (i.e., $E[\cdot|\boldsymbol{\xi}_{v}]$ ; we also write $E_{\boldsymbol{\xi}_{v}}[\cdot|B]=E[\cdot|\boldsymbol{\xi}_{v},B]$ for any random variable). Given the above, we have

[TABLE]

where the first equality uses (73), the second and third equalities follow from (74) and (76), respectively, the fourth equality holds since $(W_{vk},L_{vk})\perp\boldsymbol{u}_{vk}|\boldsymbol{\xi}_{v}$ , completing the proof.

Proof of Proposition 2. Let

[TABLE]

where henceforth we suppress the dependence of $\bar{\pi}_{vk}$ on $\boldsymbol{\xi}_{v}$ for notational simplicity. By Proposition 1 and (6), we have

[TABLE]

Given these, we can write

[TABLE]

We can easily see that if a symmetric solution to the system of $N_{v}$ equations in (79) exists uniquely, then that of (7) (in terms of $\{\bar{\Pi}_{vh}\}_{h=1}^{N_{v}}$ ) also exists uniquely (vice versa; note that $\bar{\pi}_{vh}=\sum_{k=1}^{N_{v}}\bar{\Pi}_{vk}-\left(N_{v}-1\right)\bar{\Pi}_{vh}$ by (78)). Therefore, we investigate (79).

Corresponding to (79), define an $N_{v}$ -dimensional vector-valued function of $\boldsymbol{r}=(r_{1},r_{2},\dots,r_{N_{v}})\in\left[0,1\right]^{N_{v}}$ as

[TABLE]

where we write ${\textstyle\sum\nolimits_{1\leq k\leq N_{v}\text{; }k\neq h}}={\textstyle\sum\nolimits_{k\neq h}}$ for notational simplicity, and the metric in the domain and range spaces of $\mathcal{M}^{v}$ is defined as

[TABLE]

for any $\boldsymbol{s}=(s_{1},\dots,s_{N_{v}})$ , $\boldsymbol{\tilde{s}}=(\tilde{s}_{1},\dots,\tilde{s}_{N_{v}})\in\left[0,1\right]^{N_{v}}$ (note that both the spaces are taken to be $\left[0,1\right]^{N_{v}}$ ). Given these definitions of $\mathcal{M}^{v}(\boldsymbol{r})$ and the metric, we can easily show that the contraction property of $m^{v}(\cdot)$ carries over to $\mathcal{M}^{v}(\cdot)$ , i.e.,

[TABLE]

which implies that there exists a unique solution $\boldsymbol{r}^{\ast}$ to the ( $N_{v}$ -dimensional) vector-valued equation:

[TABLE]

Now, consider the following scalar-valued equation $r=m^{v}\left(r\right)$ . By the contraction property (9), it has a unique solution. Denote this solution by $\bar{r}^{\ast}\in\left[0,1\right]$ . By the definition of $\mathcal{M}^{v}(\cdot)$ , the vector $\boldsymbol{\bar{r}}^{\ast}=(\bar{r}^{\ast},\dots,\bar{r}^{\ast})\in\left[0,1\right]^{N_{v}}$ must be a solution to (80). Then, by the uniqueness of the solution to (80), this $\boldsymbol{\bar{r}}^{\ast}$ must be a unique solution, which is a set of symmetric beliefs. The proof is completed.

A.2 The Spatially Dependent Case

In this section, we present formal specifications for the spatially dependent process $\{\boldsymbol{u}_{vh}\}$ and derive the belief convergence result. We prove Theorem 5 below, which is a finer, more general version of Theorem 1 in Section 2 in that it also derives the rate of convergence without the assumption of symmetric beliefs.

Note that given **C1 **(independence over villages), each village may be analyzed separately. So for notational simplicity, we drop the village index $v$ , i.e. write $\left\{\left(W_{h},L_{h},\boldsymbol{u}_{h}\right)\right\}_{h=1}^{N}$ instead of $\left\{\left(W_{vh},L_{vh},\boldsymbol{u}_{vh}\right)\right\}_{h=1}^{N_{v}}$ . All of the conditions and statements here should be interpreted as conditional ones given $\boldsymbol{\xi}_{v}$ for each village $v$ , where we note that C2 and **C3-SD **are stated conditionally on $\boldsymbol{\xi}_{v}$ .

To avoid any notational confusion, we re-write C2 and **C3-SD **in the following simplified forms (without the village specific effects $\boldsymbol{\xi}_{v}$ and village index $v$ ):

C2’

$\{(W_{h},L_{h})\}_{h=1}^{N}$ is I.I.D. with $(W_{h},L_{h})$ $\sim$ $F_{WL}(w,l)$ .

C3-SD’

$\left\{\boldsymbol{u}_{h}\right\}_{h=1}^{N}$ is defined through $\boldsymbol{u}_{h}=\boldsymbol{u}(L_{h})$ , where $\left\{\boldsymbol{u}\left(l\right)\right\}_{l\in\mathbb{R}^{2}}$ is a stochastic process on $\mathbb{R}^{2}$ with the following properties: i) $\left\{\boldsymbol{u}\left(l\right)\right\}$ is alpha-mixing satisfying Assumption 3 (provided below); ii) $\left\{\boldsymbol{u}\left(l\right)\right\}_{l\in\mathbb{R}^{2}}$ is independent of $\{(W_{h},L_{h})\}_{h=1}^{N}$ .

A.2.1 Spatially Mixing Structure

Now, we provide additional specifications of $\{\boldsymbol{u}_{h}\}$ modelled as a spatially dependent process. To this end, we introduce some more notation. For a set $\mathcal{L}\subset\mathbb{R}^{2}$ , let $\boldsymbol{\sigma}[\mathcal{L}]$ be the sigma algebra generated by $\left\{\boldsymbol{u}(l):l\in\mathcal{L}\right\}$ and define

[TABLE]

where the supremum is taken over any events $B\in\boldsymbol{\sigma}[\mathcal{L}_{1}]$ and $C\in\boldsymbol{\sigma}[\mathcal{L}_{2}]$ . This $\boldsymbol{\tilde{\alpha}}$ measures the degree of dependence between two algebras; it is zero if any $B$ and $C$ are independent. We also define

[TABLE]

the collection of all finite disjoint unions of squares, $D_{j}$ , in $\mathbb{R}^{2}$ with its total volume not exceeding $b$ , where $\left|D_{j}\right|$ stands for the volume of each square $D_{j}$ . Given these, we define alpha- (strong) mixing coefficients of the stochastic process $\left\{\boldsymbol{u}\left(l\right)\right\}$ by

[TABLE]

where $d(\mathcal{L}_{1},\mathcal{L}_{2})$ is the distance between two sets: $d(\mathcal{L}_{1},\mathcal{L}_{2}):=\inf\{||l-\tilde{l}||_{1}:l\in\mathcal{L}_{1},\tilde{l}\in\mathcal{L}_{2}\}$ , $||l-\tilde{l}||_{1}$ stands for the $l^{1}$ -distance between two points in $\mathbb{R}^{2}$ : $|l_{1}-\tilde{l}_{1}|+|l_{2}-\tilde{l}_{2}|$ for $l=(l_{1},l_{2})$ and $\tilde{l}=(\tilde{l}_{1},\tilde{l}_{2})$ .212121For the verification of Theorem 5 below, this definition of the mixing coefficients using $\mathcal{R}(b)$ is slightly more complicated than necessary. We maintain this definition, however. It is the same as the one used in Lahiri and Zhu (2006), and they howed validity of a spatial bootstrap under this definition and some mild regularity conditions. We suppose $\boldsymbol{\alpha}(a;b)$ is decreasing in $a$ (and increasing in $b$ ). In particular, the decreasingness of $\boldsymbol{\alpha}$ in $a$ implies that $\boldsymbol{u}(l)$ and $\boldsymbol{u}(\tilde{l})$ are less correlated when $||l-\tilde{l}||_{1}$ is large, i.e. the process is weakly dependent when the mixing coefficients $\boldsymbol{\alpha}(a;b)$ decay to zero as $a$ tends to infinity.

For location variables $\left\{L_{h}\right\}$ , we consider the following increasing-domain asymptotic scheme, which roughly follows Lahiri (1996). We regard $\mathcal{R}^{0}$ as a ‘prototype’ of a sampling region (i.e., village), which is defined as a bounded and connected subset of $\mathbb{R}^{2}$ , and for each $N$ , we denote by $\mathcal{R}^{N}$ a sampling region of the village that is obtained by inflating the set $\mathcal{R}^{0}$ by a scaling factor $\lambda_{N}\rightarrow\infty$ maintaining the same shape, such that

[TABLE]

In particular, if $\mathcal{R}^{0}$ contains the origin $\mathbf{0}\in\mathbb{R}^{2}$ , we can write $\mathcal{R}^{N}=\lambda_{N}\mathcal{R}^{0}$ , which may be assumed WLOG. It is also assumed that $\mathcal{R}^{0}$ is contained in a square whose sides have length $1$ , WLOG. Thus, the area of $\mathcal{R}^{N}$ is equal to or less than $\lambda_{N}^{2}$ . We let $f_{0}\left(\cdot\right)$ be the probability density on $\mathcal{R}^{0}$ , and then for $s_{h}$ $\sim$ $f_{0}\left(\cdot\right)$ ,

[TABLE]

where the dependence of $L_{h}$ on $N$ is suppressed for notational simplicity.222222Note that when $\mathcal{R}^{0}$ does not contain the origin, we need to consider some location shift: $L_{h}=\lambda_{N}(s_{h}-s^{\ast})$ instead of (84), where $s^{\ast}$ is some point in $\mathcal{R}^{0}$ such that the region ‘ $\mathcal{R}^{0}-s^{\ast}$ ’ (shifted by $s^{\ast}$ ) contains the origin. Given these, we have $L_{h}$ $\sim(1/\lambda_{N}^{2})f_{0}\left(\cdot/\lambda_{N}\right)$ , and the expected number of households residing in a region $A\subset\mathcal{R}^{N}(\subset\mathbb{R}^{2})$ is

[TABLE]

We can also compute the expected distance of two individuals with $L_{k}$ and $L_{h}$ :

[TABLE]

using changing variables with $\tilde{s}=\tilde{l}/\lambda_{N}$ and $s=l/\lambda_{N}$ . Since the second term on the last line is a finite integral (independent of $N$ ), which exists under $\sup_{s\in\mathcal{R}^{0}}f_{0}\left(\cdot\right)<\infty$ , *the average distance between any $k$ and $h$ grows at the rate of * $\lambda_{N}$ . This sort of growing-average-distance feature is key to establishing limit theory for spatially dependent data under the weakly dependent (mixing) condition above. We discuss this point and its implications below after introducing Assumption 3.

Now, we state the following additional conditions on the data generating mechanism:

Assumption 3

(i) The stochastic process $\left\{\boldsymbol{u}\left(l\right)\right\}_{l\in\mathcal{R}^{N}}$ is alpha-mixing with its mixing coefficients satisfying

[TABLE]

for some constants, $C,\tau_{1}\in\left(0,\infty\right)$ and $\tau_{2}\geq 0$ , where $\boldsymbol{\alpha}(a;b)$ is defined in (82). (ii) Let $\left\{L_{h}\right\}_{h=1}^{N}$ be an I.I.D. sequence introduced in C2’. Each $L_{h}$ defined through (84) is continuously distributed with its support $R_{N}$ (defined through $\mathcal{R}^{N}=\lambda_{N}\mathcal{R}^{0}$ ) and probability density function, $f_{L}\left(\cdot\right)=(1/\lambda_{N}^{2})f_{0}\left(\cdot/\lambda_{N}\right)$ , satisfying $\sup_{s\in\mathcal{R}^{0}}f_{0}\left(s\right)<\infty$ .

Condition (i) controls the degree of spatial dependence of $\left\{\boldsymbol{u}\left(l\right)\right\}$ , which is a key for establishing limit (LLN/CLT) results. The same condition is used in Lahiri and Zhu (2006), and some analogous conditions are also imposed in other papers such as Jenish and Prucha (2012). (ii) is the increasing-domain condition, and is important for establishing consistency of estimators (Lahiri, 1996). The uniform boundedness of the density is imposed for simplifying proofs, but can be relaxed at the cost of a more involved proof.

Conditions (i) and (ii) have an important implication for identification and estimation of our model: Given the increasing-domain condition (ii), the distance between two of individuals, $k$ and $h$ , on average, increases with the rate $\lambda_{N}\rightarrow\infty$ as $N\rightarrow\infty$ , as in (85). This implies that, given the weak dependence condition (i), the correlation between two variables, $\boldsymbol{\eta}_{k}$ and $\boldsymbol{\eta}_{h}$ , for any $k$ and $h$ , becomes weaker as $N$ tends to $\infty$ . In other words, for each $h$ , the number of other individuals who are almost uncorrelated with $h$ tends to $\infty$ and, furthermore, the ratio of such individuals (among all $N$ players) tends to $1$ . That is, the conditional law of $\boldsymbol{u}(L_{k})$ and that of $A_{k}$ are less affected by $\boldsymbol{u}(L_{h})$ for larger $N$ , and thus $E\left[A_{k}\text{ }|W_{h},L_{h},\boldsymbol{u}(L_{h})\right]$ converges to $E\left[A_{k}\right]$ . We formally verify this convergence result in Theorem 5.

Note that such convergence is not specific to our specification of the data-generating mechanism, but it occurs generically in settings with spatial data. For example, Jenish and Prucha (2012) derive various limit results for spatial data (or random fields) under the increasing-domain assumption and the so-called minimum distance condition , where the latter means that the distance between any two individuals is larger than some fixed constant $\underline{d}>0$ (independent of $N$ ).232323Note that our increading-domain assumption (together with the specification of the density of $L_{h}$ ) implies that for any $\underline{d}>0$ , $k\neq h$ ,

$\displaystyle\Pr\left(||L_{k}-L_{h}||_{1}\leq\underline{d}\right)=\Pr\left(||s_{k}-s_{h}||_{1}\leq\lambda_{N}^{-1}\underline{d}\right)$

$\displaystyle=\int\int 1\left\{||u-r||_{1}\leq\lambda_{N}^{-1}\underline{d}\right\}f_{0}\left(u\right)f_{0}\left(r\right)dudr\rightarrow 0,$

where the convergence holds as the area of $\left\{(u,r)\left|\text{ }||u-r||_{1}\leq\lambda_{N}^{-1}\underline{d}\right.\right\}$ shrinks to zero and $f_{0}\left(\cdot\right)$ is uniformly bounded; thus for any $\underline{d}>0$ , we have the minimum distance condition with probability approaching $1$ . These two assumptions imply that the number of individuals who are ‘far away’ from each $h$ tends to $\infty$ . This, together with the mixing condition as in (i) of Assumption 3, drives the convergence of conditional expectations.

Before concluding this subsection, we present the following Assumption 4 under which Theorem 1 in Section 2 is verified. This is a multi-village version of Assumption 3 in which we allow for $\bar{v}>1$ and $\boldsymbol{\xi}_{v}\neq 0$ (and thus $\boldsymbol{\eta}_{vh}=\boldsymbol{\xi}_{v}+\boldsymbol{u}_{vh}$ ):

Assumption 4

(i) For each $v\in\left\{1,\dots,\bar{v}\right\}$ , given $\boldsymbol{\xi}_{v}$ , the stochastic process $\{\boldsymbol{u}_{v}\left(l\right)\}_{l\in\mathcal{R}_{v}^{N}}$ is alpha-mixing with its mixing coefficients satisfying $\boldsymbol{\alpha}^{v}(a;b)\leq Ca^{-\tau_{1}}b^{\tau_{2}}$ for some constants $C\in\left(0,\infty\right)$ , $\tau_{1}>0$ , and $\tau_{2}\geq 0$ , where the definition of $\boldsymbol{\alpha}(a;b)=\boldsymbol{\alpha}^{v}(a;b)$ follows (82). (ii) For each $v$ , given $\boldsymbol{\xi}_{v}$ , let $\left\{L_{vh}\right\}_{h=1}^{N_{v}}$ be the conditionally I.I.D. sequence introduced in C2. Each $L_{vh}$ is continuously distributed with its support $\mathcal{R}_{v}^{N}=\lambda_{N}\mathcal{R}_{v}^{0}$ and PDF $f_{L}^{v}\left(\cdot\right)=(1/\lambda_{N}^{2})f_{0}^{v}\left(\cdot/\lambda_{N}\right)$ satisfying $\sup_{s\in\mathcal{R}_{v}^{0}}f_{0}^{v}\left(s\right)<\infty$ , where $\mathcal{R}_{v}^{0}$ is a ‘prototype’ sampling region for each village $v$ and $\lambda_{N}$ is a scaling constant with $N/\lambda_{N}^{2}\rightarrow c$ for some $c\in\left(0,\infty\right)$ .

A.2.2 Convergence of Equilibrium Beliefs

To formally state our belief convergence result, we introduce the following functional operator $\mathcal{T}^{\infty}$ that maps a $[0,1]$ -valued function $g$ to some constant in $\left[0,1\right]$ :

[TABLE]

where $\mathcal{T}^{\infty}\left[g\right]$ is independent of $k$ by the (conditional) I.I.D.-ness of $\left\{W_{k},L_{k}\right\}$ ( $W_{k}=(Y_{k},P_{k})^{\prime}$ ) and the independence between $\left\{W_{k},L_{k}\right\}$ and $\{\boldsymbol{u}(l)\}$ , imposed in C2’ and C3-SD’. If $\{(W_{k},L_{k},\boldsymbol{u}(L_{k}))\}_{k=1}^{N}$ were I.I.D., the equilibrium beliefs would be characterized as a fixed point of this $\mathcal{T}^{\infty}$ (as clarified through Propositions 1 and 2). While beliefs are given as conditional expectations under the spatial dependence of unobserved heterogeneity as modelled in C3-SD’ they are still characterized through $\mathcal{T}^{\infty}$ in an asymptotic sense stated below.

To show this, we introduce the following mapping to characterize the beliefs under C3-SD’ for each $N$ . Let $\boldsymbol{g}^{N}=(g_{1},\dots,g_{N})$ be an $N$ -dimensional vector valued function, each element of which is a $[0,1]$ -valued function $g_{h}$ on the support of $(W_{h},L_{h},\boldsymbol{u}(L_{h}))$ . Then, define $\mathbb{T}^{N}$ as a functional mapping from $\boldsymbol{g}^{N}$ to an $N$ -dimensional random vector:

[TABLE]

where each $\mathcal{T}_{N,h}\left[\boldsymbol{g}^{N}\right]$ is a mapping from $\boldsymbol{g}^{N}$ to a $[0,1]$ -valued random variable defined as

[TABLE]

Note that $\mathcal{T}_{N,h}\left[\boldsymbol{g}^{N}\right]$ corresponds to individual $h$ ’s belief $\Pi_{h}$ (this is written as $\Pi_{vh}$ in Section 2 where multiple villages are considered), when $h$ predicts other $k$ ’s behavior using $g_{k}(W_{k},L_{k},\boldsymbol{u}(L_{k}))$ . Therefore, in the equilibrium, the system of beliefs,

[TABLE]

is given as that satisfies the fixed point restriction:

[TABLE]

almost surely, where we write $\boldsymbol{\psi}^{N}=(\psi_{1},\dots,\psi_{N})$ , a vector of function; note that each element of the solution, $\psi_{1},\dots,\psi_{N}$ , depends on $N$ but we suppress this for notational simplicity.

Note that (87) may be equivalently written in the following coordinate-wise form:

[TABLE]

The next theorem states the convergence of each $\psi_{h}(W_{h},L_{h},\boldsymbol{u}(L_{h}))$ to a unique fixed point of $\mathcal{T}_{\infty}$ , which is a constant $\bar{\pi}=E\left[A_{k}\right]$ :

Theorem 5 (Convergence of beliefs under spatial correlation)

Suppose that C2’ and C3-SD’ hold with Assumption 3, and the functional map $\mathcal{T}^{\infty}$ defined in (86) is a contraction with respect to the metric induced by the norm $||g||_{L^{1}}:=E[|g(W_{h},L_{h},\boldsymbol{u}(L_{h}))|]<\infty$ ( $g$ is a $\left[0,1\right]$ -valued function on the support of $(W_{h},L_{h},\boldsymbol{u}(L_{h}))$ ), i.e.,

[TABLE]

Let $\bar{\pi}\in\left[0,1\right]$ be a (unique) solution to the functional equation $g=\mathcal{T}^{\infty}[g]$ . Then, it holds that for any solution $\boldsymbol{\psi}^{N}=(\psi_{1},\dots,\psi_{N})$ to the functional equation (87), which may not be unique,

[TABLE]

where $\bar{C}_{\rho}\in\left(0,\infty\right)$ is some constant (independent of $N$ , $\boldsymbol{\psi}^{N}$ , and $\bar{\pi}$ ), whose explicit expression is provided in the proof, and thus

[TABLE]

An important pre-requisite of Theorem 5 is that the mapping $\mathcal{T}^{\infty}$ is a contraction. This condition is easy to verify, e.g., see Section A.3 for a sufficient condition for the contraction property under a linear-index restriction on the utilities. Roughly speaking, we can show that $\mathcal{T}^{\infty}$ is a contraction if the extent of social interactions is not ‘too large’.

The contraction property of the unconditional expectation operator $\mathcal{T}^{\infty}$ implies uniqueness of its fixed-point, the conditional expectation operators $\mathbb{T}^{N}\left[\boldsymbol{g}^{N}\right]=(\mathcal{T}_{N,1}\left[\boldsymbol{g}^{N}\right],\dots,\mathcal{T}_{N,N}\left[\boldsymbol{g}^{N}\right])$ need not be a contraction and may admit multiple fixed points (i.e., multiplicity of equilibria). The theorem states each of the non-unique equilibrium beliefs in each $N$ -player game converges to the unique fixed point of $\mathcal{T}^{\infty}$ . In examples, existence of a fixed-point solution of $\mathbb{T}^{N}$ is relatively easy to check, but its uniqueness or contraction property may not be; indeed, verification of the latter may require an appropriate specification of joint distributional properties of $\left\{\boldsymbol{u}_{h}\right\}_{h=1}^{N}=\left\{\boldsymbol{u}\left(L_{h}\right)\right\}_{h=1}^{N}$ as the operator $\mathbb{T}^{N}$ is based on conditional expectations.

Theorem 5 provides the rate of convergence of equilibrium beliefs in (88). Using this result, if the degree of spatial dependence is not too strong with $\tau_{1}>4$ , then, we can strengthen the belief convergence result to the uniform one:

[TABLE]

since $\lambda_{N}=O(\sqrt{N})$ as specified in (83).

Proof of Theorem 5. Define a functional mapping $\mathcal{T}_{N,h}^{\infty}$ from an $N$ -dimensional vector valued function $\boldsymbol{g}^{N}=(g_{1},\dots,g_{N})$ to $r\in\left[0,1\right]$ :

[TABLE]

where $\mathcal{T}^{\infty}$ is defined in (86) (as a mapping on scalar valued functions), and each $g_{h}^{N}$ is a $\left[0,1\right]$ -valued function on the support of $(W_{h},L_{h},\boldsymbol{u}(L_{h})$ ). Based on this $\mathcal{T}_{N,h}^{\infty}$ , we also define an $N$ -dimensional vector mapping:

[TABLE]

We also write $\boldsymbol{\bar{\pi}}^{N}=\left(\bar{\pi},\dots,\bar{\pi}\right)$ , the $N$ -dimensional vector each element of which is $\bar{\pi}$ . Then, since $\bar{\pi}$ is a fixed point of $\mathcal{T}^{\infty}$ (i.e., $\bar{\pi}=\mathcal{T}^{\infty}[\bar{\pi}]$ ), it obviously holds that

[TABLE]

Now, since $\boldsymbol{\psi}^{N}=(\psi_{1},\dots,\psi_{N})$ solves the functional equation:

[TABLE]

where $\mathbb{T}^{N}$ maps an $N$ -dimensional vector valued function to an $N$ -dimensional random vector.

Given (90) and (91), we can see that

[TABLE]

Thus, by the triangle inequality and the contraction property of $\mathcal{T}_{\infty}$ , we have

[TABLE]

By the definition of $\mathcal{T}_{N,h}^{\infty}$ in (89) as well as that of $\boldsymbol{\bar{\pi}}^{N}=\left(\bar{\pi},\dots,\bar{\pi}\right)$ , the second term on the majorant side is bounded by

[TABLE]

where the last inequality follows from the contraction condition on $\mathcal{T}^{\infty}$ . Thus, this bound and (92) lead to

[TABLE]

Therefore, if it holds that

[TABLE]

for some constant $\bar{C}\in\left(0,\infty\right)$ independent of $N$ , where the supremum is taken over any (Borel measurable) functions, $\boldsymbol{g}^{N}:\left[0,1\right]^{N}\rightarrow\left[0,1\right]^{N}$ , then the desired result (88) holds with $\bar{C}_{\rho}=\tfrac{1}{1-\rho}\bar{C}$ .

Proof of (93). For notational simplicity, we write

[TABLE]

for an arbitrary function, $g:\left[0,1\right]\rightarrow\left[0,1\right]$ . Then, the inequality (93) follows if

[TABLE]

where the supremum is taken over any (Borel measurable) functions, $g:\left[0,1\right]\rightarrow\left[0,1\right]$ .

To show this inequality, observe that by (ii) of C3-SD’,

[TABLE]

Here, we recall the following result on independence: for random objects $Q$ , $R$ , and $S$ ,

[TABLE]

Applying this with $Q=\left\{\boldsymbol{u}\left(l\right)\right\}$ , $R=(W_{k},L_{k})$ , and $S=(W_{h},L_{h})$ , since C2’ implies that $(W_{h},L_{h})\perp(W_{k},L_{k})$ , we can obtain

[TABLE]

which in turn implies that

[TABLE]

The relation (96) also leads to

[TABLE]

for any $\tilde{l}$ . Then, we can compute the conditional expectation in (94) as

[TABLE]

where the first and third equalities have used (97) and (98), respectively.

Now, we look at the maximand on the LHS of (94):

[TABLE]

where $E^{\boldsymbol{u}}\left[\cdot\right]$ is the expectation that only concerns $\left\{\boldsymbol{u}(l)\right\}_{l\in\mathbb{R}^{2}}$ ; the first equality uses (105) and the independence of $\left\{\boldsymbol{u}(l)\right\}_{l\in\mathbb{R}^{2}}$ and $(W_{h},L_{h})$ ; the second equality again uses the same independence condition (i.e., $(\boldsymbol{u}(\tilde{l}),\boldsymbol{u}(l))\perp(W_{h},L_{h})$ and thus $\boldsymbol{u}(\tilde{l})\perp(W_{h},L_{h})|$ $\boldsymbol{u}(l)$ ); the third equality holds since

[TABLE]

by the independence of $\left\{\boldsymbol{u}(l)\right\}$ and $\left(W_{k},L_{k}\right)$ , and the last inequality uses the Fubini theorem.

To bound the RHS of (106), note that for $||\tilde{l}-l||_{1}>0$ , we can always construct two sets on $\mathbb{R}^{2}$ , $\mathcal{\tilde{L}}$ and $\mathcal{L}$ satisfying 1) the former contains $\tilde{l}$ and the latter contains $l$ , 2) the distance between the two sets is larger than $||\tilde{l}-l||_{1}/2$ , 3) Each of $\mathcal{\tilde{L}}$ and $\mathcal{L}$ is a square in $R_{N}$ with its area less than $1$ . $\boldsymbol{u}(\tilde{l})$ and $\boldsymbol{u}(l)$ are measurable with respect to $\boldsymbol{\sigma}[\mathcal{\tilde{L}}]$ and $\boldsymbol{\sigma}[\mathcal{L}]$ , respectively. Then, noting the definition of mixing coefficients of $\left\{\boldsymbol{u}\left(l\right)\right\}$ in (81) and (82), these 1)

3) allow us to apply McLeish’s mixingale inequality (p. 834 of McLeish, 1975; or Theorem 14.2 of Davidson, 1994) and derive its bound in terms of $\boldsymbol{\alpha}(||\tilde{l}-l||_{1}/2;1)$ . That is, since $\left|m_{g}\right|$ is uniformly bounded $(\leq 1)$ , we obtain

[TABLE]

uniformly over any $\tilde{w}$ , $\tilde{l}$ , and $l$ .

To find an upper bound of the majorant side of (106), recall that the (marginal) distribution function $F_{L}$ (whose support is given by $R_{N}$ ) has the density $f_{L}\left(l\right)=(1/\lambda_{N}^{2})f_{0}\left(l/\lambda_{N}\right)$ for each $N$ , and also that by the definition of the mixing coefficients in (81) and (82), $\boldsymbol{\alpha}(a;b)\leq 2$ uniformly over any $a$ , $b$ . Then, plugging (107), we have

[TABLE]

where $\bar{f}_{0}:=\sup_{s\in\mathcal{R}^{0}}f_{0}\left(\cdot\right)$ , the last inequality holds since

[TABLE]

by changing variables, and for $||\tilde{s}-s||_{1}>\lambda_{N}^{-\tau_{1}/2}$ ,

[TABLE]

Thus, we can see that this upper bound of (106) is independent of $h$ , $k$ , and $g$ , and thus the inequality (94) holds with $\bar{C}:=6\left[2+C2^{\tau_{1}}\right]\bar{f}_{0}^{2}$ , completing the proof.

A.3 Sufficient Conditions for Contraction

Here, we investigate the contraction property of $\mathcal{F}_{v,N_{v}}^{\star}$ (defined in (29)) as well as its limit operator:

[TABLE]

$\mathcal{F}_{v,\infty}^{\star}$ is a functional operator from a $\left[0,1\right]$ -valued function $g=g\left(l,e;\theta_{1},\theta_{2}\right)$ to a constant $\mathcal{F}_{v,\infty}^{\star}\left[g\right]\in\left[0,1\right]$ . This limit operator is used investigate convergence properties of the estimators. We impose the following conditions:

Assumption 5

(i) For any $\alpha\in[\bar{l}_{\bar{v}},\bar{u}_{\bar{v}}]$ , $\alpha\geq 0$ and the density $\boldsymbol{h}$ of the conditional CDF $\boldsymbol{H}(\tilde{e}|e_{a},d;\theta_{2})$ satisfies

[TABLE]

*where $\tilde{l}$ and $l$ denote location indices associated with $\tilde{e}$ and $e$ , respectively, $||\tilde{l}-l||_{1}$ stands for the distance, and the interval $[\bar{l}_{\bar{v}},\bar{u}_{\bar{v}}]$ is the set of possible values of $\alpha$ (introduced in Assumption 7).

(ii) The conditional CDF $\boldsymbol{H}(\cdot|e,d;\theta_{2})$ satisfies*

[TABLE]

for any $\tilde{e}\in\mathbb{R}$ and any $d,\theta_{2}$ , if $e_{a}\geq e_{b}$ .

These conditions are used to verify the so-called Blackwell sufficient conditions (c.f. Theorem 3.3 of Stokey and Lucas, 1989: I). The non-negativity of $\alpha$ is used for the monotonicity. While (110) is a condition for the conditional density, it also implies the same condition for the marginal density:

[TABLE]

since $\boldsymbol{h}(\tilde{e})=\int\boldsymbol{h}(\tilde{e}|e,||\tilde{l}-l||_{1};\theta_{2})\boldsymbol{h}(e)de$ (recalling that $\boldsymbol{H}(e)$ is defined as the CDF of $\varepsilon_{vh}$ and $F_{\varepsilon}\left(-e\right)$ is that of $-\varepsilon_{vh}$ , it holds that $\boldsymbol{h}(e)=f_{\varepsilon}\left(-e\right)$ ). Condition (ii) means that $\boldsymbol{H}(\cdot|e_{a},d;\theta_{2})$ first-order stochastically dominates $\boldsymbol{H}(\cdot|e_{b},d;\theta_{2})$ , implying that any two of (spatially dependent) variables, $\varepsilon_{vk}$ and $\varepsilon_{vh}$ , are (weakly) positively correlated, which is also conveniently used to show the monotonicity of $\mathcal{F}_{v,N_{v}}^{\star}$ .

Given these preparations, we can show the contraction properties of $\mathcal{F}_{v,\infty}^{\star}$ and $\mathcal{F}_{v,N_{v}}^{\star}$ :

Proposition 3

*Suppose that (i) of Assumption 5 holds. Then, $\mathcal{F}_{v,\infty}^{\star}$ is a contraction in the space of $\left[0,1\right]$ -valued functions on $\mathcal{R}_{N_{v}}^{v}\times\mathbb{R\times\Theta}_{1}\times\Theta_{2}$ , $g(l,e;\theta_{1},\theta_{2})$ , each of which are nondecreasing in $e$ , equipped with the sup metric, where $\mathcal{R}_{N_{v}}^{v}$ denotes the support of the random variable $L_{vh}$ .

b) Suppose that Assumption 5 hold. Then, $\mathcal{F}_{v,N_{v}}^{\star}$ is a contraction in the same space.*

The restriction for $g$ being nondecreasing-ness is innocuous when considering fixed points of $\mathcal{F}_{v,\infty}^{\star}$ and $\mathcal{F}_{v,N_{v}}^{\star}$ . This is because, given the non-negativity of $\alpha$ and the stochastic-dominance of $\boldsymbol{H}$ , the fixed points are also nondecreasing in $e$ (since

$\mathcal{F}_{v,\infty}^{\star}\left[g\right]$ and $\mathcal{F}_{v,N_{v}}^{\star}\left[g\right]$ are also nondecreasing in $e$ for such a nondecreasing).

In this proposition, we have defined the limit operator $\mathcal{F}_{v,\infty}^{\star}$ on the set of general functions, $g(l,e;\theta_{1},\theta_{2})$ , which may depend on $\left(l,e\right)$ . This general domain space is required to consider the convergence of the operator $\mathcal{F}_{v,N_{v}}^{\star}$ and its fixed point. However, if we define the limit operator $\mathcal{F}_{v,\infty}^{\star}$ only on the restricted space of functions, $g(\theta_{1},\theta_{2})$ , each of which is independent of $(l,e)$ , we can write

[TABLE]

since $\boldsymbol{H}\left(e\right)=1-F_{\varepsilon}\left(-e\right)$ . In this case, by the Lipschitz continuity of $F_{\varepsilon}$ , we can check the contraction property of $\mathcal{F}_{v,\infty}^{\star}$ on the restricted space under

[TABLE]

Note that in the probit specification in which $\varepsilon_{vh}$ is supposed to follow the standard normal, $\sup_{e\in\mathbb{R}}f_{\varepsilon}\left(e\right)=1/\sqrt{2\pi}$ ; and the logit specification, $\sup_{e\in\mathbb{R}}f_{\varepsilon}\left(e\right)=1/4$ .

Proof of Proposition 3. First, we investigate $\mathcal{F}_{v,\infty}^{\star}$ by using the Blackwell sufficient conditions. Since $\alpha\geq 0$ , we have $\mathcal{F}_{v,\infty}^{\star}\left[f\right]\geq\mathcal{F}_{v,\infty}^{\star}\left[g\right]$ for any two functions $f,g$ with $f(l,e;\theta_{1},\theta_{2})\geq g(l,e;\theta_{1},\theta_{2})$ , implying the monotonicity condition. II) For a constant $\bar{a}\geq 0$ ,

[TABLE]

Since $g(\tilde{l},\tilde{e};\theta_{1},\theta_{2})$ is nondecreasing in $\tilde{e}$ and $\alpha\geq 0$ , $\alpha g(\tilde{l},\tilde{e};\theta_{1},\theta_{2})+\tilde{e}$ is strictly increasing in $\tilde{e}$ . Thus, we can find a unique $e_{0}$ satisfying

[TABLE]

for each $(\tilde{w},\tilde{l},\theta_{1},\theta_{2})$ . For each $\bar{a}\geq 0$ , let $\bar{e}$ be a unique number satisfying

[TABLE]

Since $\alpha\bar{a}\geq 0$ and the slope of the function $\alpha g(\tilde{l},\tilde{e};\theta_{1},\theta_{2})+\tilde{e}$ is greater than or equal to $1$ , we must have $e_{0}>\bar{e}$ and $\left(e_{0}-\bar{e}\right)\times 1\leq\alpha\bar{a}$ . This upper bound of $\left(e_{0}-\bar{e}\right)$ holds for any $(\tilde{w},\tilde{l},\theta_{1},\theta_{2})$ . Thus,

[TABLE]

Therefore, if (110) holds, the so-called discounting condition is satisfied. Therefore, given I) and II), we have verified $\mathcal{F}_{v,\infty}^{\star}$ is a contraction.

Next, we investigate $\mathcal{F}_{v,N_{v}}^{\star}$ . Note that since $g(l,e;\theta_{1},\theta_{2})$ is nondecreasing in $e$ , so is $1\{\tilde{w}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha g(\tilde{l},\tilde{e};\theta_{1},\theta_{2})+\tilde{e}\geq 0\}$ , and given (ii) of Assumption 5, the mapped function $\mathcal{F}_{v,N_{v}}^{\star}\left[g\right]\left(l,e;\theta_{1},\theta_{2}\right)$ is also nondecreasing. Therefore, the domain and range spaces of $\mathcal{F}_{v,N_{v}}^{\star}$ can be taken to be identical. We can also check the Blackwell sufficient conditions for $\mathcal{F}_{v,N_{v}}^{\star}$ exactly in the same way as for $\mathcal{F}_{v,\infty}^{\star}$ , implying the desired contraction property.

A.4 Proof of Theorem 2 (the Estimators’

Convergence)

Here, we prove Theorem 2 through several lemmas. In Section 3, for ease of exposition, we assumed that the village-fixed effects $\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}}$ are known to the econometrician. Here, we explicitly include them in the parameter $\theta_{1}$ to be estimated. Note also that identification of preference parameters in presence of $\bar{\xi}^{\prime}$ s requires identification of the $\bar{\xi}^{\prime}$ s themselves; hence we need to use one of the methods for doing so, as described in Section 4.4. Here we use the homogeneity assumption $\bar{\xi}_{1}=\bar{\xi}_{\bar{v}}$ ; an alternative proof can be given for the correlated random effects case. To sum up, for this section, we re-define the eventual parameter as $\theta_{1}=(c^{\prime},\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)$ (see e.g. Assumption 7), with all other related quantities interpreted analogously. Consistency of the estimators for the case with $\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}}$ known is a simpler corollary of Theorem 2.

To analyze $\hat{\theta}^{\mathrm{FPL}}$ and $\hat{\theta}^{\mathrm{BR}}$ , we define the following conditional moment restriction:

[TABLE]

where $A_{vh}^{\infty}$ is a hypothetical outcome variable based on the limit model242424Recall that $\theta_{1}^{\ast}$ has been defined through the conditional moment restriction (32) for the observed variables $(A_{vh},W_{vh},L_{vh})$ generated from the finite-player game ( $A_{vh}$ is generated from (28) or equivalently (30)). $\theta_{1}^{\ast}$ may also be defined as the one satisfying restriction(111), which is correctly specified for the variables (hypothetically) generated from the limit model, $(A_{vh}^{\infty},W_{vh})$ .:

[TABLE]

For each $v$ , let $r_{v}=\lim\frac{N_{v}}{N}$ , where this limit ratio value is supposed to be in $\left(0,1\right)$ (note that $N=\sum_{v=1}^{\bar{v}}N_{v}$ ). We also consider the limit versions of $\hat{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ and $\hat{L}^{\mathrm{BR}}\left(\theta_{1}\right)$ ,

[TABLE]

respectively, where $\pi_{v}^{\star}\left(\theta_{1}\right)$ in $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ is defined as a solution to (40) for each $\theta_{1}$ , and $\bar{\pi}_{v}$ in $L^{\mathrm{BR}}\left(\theta_{1}\right)$ is defined as the (probability) limit of $\hat{\pi}_{v}=\frac{1}{N_{v}}\sum_{h=1}^{N_{v}}A_{vh}$ (note that the limits of $\hat{\pi}_{v}$ and $\frac{1}{N_{v}}\sum_{h=1}^{N_{v}}A_{vh}^{\infty}$ coincide, which follows from arguments analogous to those in the proof of Lemma 3). The first order condition of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ may be seen as an unconditional moment restriction based on the conditional one (111).

Note that given the continuity of $F_{\varepsilon}\left(\cdot\right)$ , $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ and $L^{\mathrm{BR}}\left(\theta_{1}\right)$ are continuous in $\Theta_{1}$ . Lemma 3 shows the uniform convergence of $\hat{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ to $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ in probability over $\Theta_{1}$ ; we can also show that of $\hat{L}^{\mathrm{BR}}\left(\theta_{1}\right)$ to $L^{\mathrm{BR}}\left(\theta_{1}\right)$ in probability over $\Theta_{1}$ (the proof this result is analogous to that of Lemma 3, and is omitted).

Given the limit objective function, we let

[TABLE]

Lemma 2 shows identification of $\theta_{1}^{\ast}$ (i.e., it is a unique maximizer of $L^{\mathrm{FPL}}(\theta_{1})$ over $\Theta_{1}$ ) and the same result as for $\theta_{1}^{\#}$ . As a result, by Theorem 2.1 of Newey and McFadden (1994), given the compactness of the parameter space $\Theta_{1}$ , we obtain

[TABLE]

Since Lemma 2 also shows that $\theta_{1}^{\ast}=\theta_{1}^{\#}$ under the correct specification, we have $||\hat{\theta}_{1}^{\mathrm{FPL}}-\hat{\theta}_{1}||$ .

By Lemma 4, we have $\sup_{\theta_{1}\in\Theta_{1}}\left|\hat{L}^{\mathrm{SD}}(\theta_{1},\hat{\theta}_{2})-\hat{L}^{\mathrm{FPL}}\left(\theta_{1}\right)\right|=o_{p}\left(1\right)$ , which, together with Lemma 3, implies that

[TABLE]

This in turn means that $\hat{\theta}_{1}^{\mathrm{SD}}\overset{p}{\rightarrow}\theta_{1}^{\ast}$ (by using Newey and McFadden’s Theorem 2.1 again). These lead to the conclusion of the theorem.

A.4.1 Identification Results: Lemmas 1 -

2

In this subsection, we investigate identification of $\theta_{1}^{\ast}$ and $\theta_{1}^{\#}$ (defined in (113) and (114), respectively). To this end, we impose the following conditions:

Assumption 6

(i) Let $\boldsymbol{u}_{v}\left(l\right)=(u_{v}^{0}\left(l\right)),u_{v}^{1}\left(l\right))$ and

[TABLE]

*and the (marginal) CDF of $-\varepsilon_{v}(l)$ is $F_{\varepsilon}(\cdot)$ for each $l\in\mathcal{L}_{v}$ , whose functional form is supposed to be known, and $F_{\varepsilon}\left(\cdot\right)$ is strictly increasing on $\mathbb{R}$ with its continuous PDF $f_{\varepsilon}(\cdot)$ satisfying $\sup_{z\in\mathbb{R}}f_{\varepsilon}(z)<\infty$ .

(ii) The random vector $W_{vh}$ includes no constant component. The support of $(W_{vh}^{\prime},1)^{\prime}$ is not included in any proper linear subspace of $\mathbb{R}^{d_{W}+1}$ , where $d_{W}$ is the dimension of $W_{vh}$ .*

Assumption 6 is quite standard. The condition in (i) on the support of $-\varepsilon_{v}(l)$ may be relaxed, allowing for some bounded support (instead of $\mathbb{R}$ ), but it simplifies our subsequent conditions and proofs and thus is maintained.

Assumption 7

(i) Let $\bar{\pi}_{v}\left(\in\left(0,1\right)\right)$ be the probability limit of $\hat{\pi}_{v}=\frac{1}{N_{v}}\sum_{h=1}^{N_{v}}A_{vh}$ . It holds that

[TABLE]

(ii) Denote by $\theta_{1}=(\boldsymbol{c}^{\prime},\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime}$ a generic element in the parameter space $\Theta_{1}$ . $\Theta_{1}$ is a compact subset of $\mathbb{R}^{d_{W}+\bar{v}}$ such that

[TABLE]

*where $\Theta_{\boldsymbol{c}}$ is a compact subset of $\mathbb{R}^{d_{W}}$ in which $\boldsymbol{c}$ lies and ${\textstyle\prod\nolimits_{v=1}^{\bar{v}}}[\bar{l}_{v},\bar{u}_{v}]$ is a closed rectangular region of $\mathbb{R}^{\bar{v}}$ (with some $\bar{l}_{v},\bar{u}_{v}\in\mathbb{R}$ ) in which $(\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime}$ lies.

(iii) For any $\alpha\in[\bar{l}_{\bar{v}},\bar{u}_{\bar{v}}]$ ,*

[TABLE]

(iv) Let $\boldsymbol{c}^{\diamondsuit}$ be an element of $\Theta_{c}$ . Given this $\boldsymbol{c}^{\diamondsuit}$ (fixed), for any $(\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime}\in{\textstyle\prod\nolimits_{v=1}^{\bar{v}}}[\bar{l}_{v},\bar{u}_{v}]$ , it holds that

[TABLE]

where $\left.\pi_{v}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}$ stands for $\pi_{v}^{\star}((\boldsymbol{c}^{\ast\prime},\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime})$ , a unique solution to the fixed point equation, $\pi_{v}=\int F_{\varepsilon}(w^{\prime}\boldsymbol{c}^{\ast}+\bar{\xi}_{v}+\alpha\pi_{v})dF_{W}^{v}(w)$ ( $v=1,\dots,\bar{v},$ with $\bar{\xi}_{1}=\bar{\xi}_{\bar{v}}$ ).

Assumption 7 (i) leads to different ‘constant’ terms for $v=1,\bar{v}$ under the homogeneity assumption ( $\bar{\xi}_{1}=\bar{\xi}_{\bar{v}}$ ), i.e.,

[TABLE]

This is required for identification of $\bar{\xi}_{1}^{\#},\dots,\bar{\xi}_{\bar{v}-1}^{\#},\alpha^{\#}$ in $\theta_{1}^{\#}$ through the Brock-Durlauf type objective function $L^{\mathrm{BR}}\left(\theta_{1}\right)$ .

Conditions (ii) - (iv) are used for identification of $\theta_{1}^{\ast}$ via $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ . The rectangularity of the parameter space for $(\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime}$ imposed in (ii) is a technical requirement when using Gale and Nikaido’s (1965) result for univalent functions (see their Theorem 4 and our proof of Lemma 1). The restriction on $\alpha$ in (116) in (iii) guarantees the contraction property of the fixed point problem (see discussions in Appendix A.3). As for (iv), since $\pi_{1}^{\star}\left(\theta_{1}\right)$ and $\pi_{\bar{v}}^{\star}\left(\theta_{1}\right)$ in $L^{\mathrm{FPL}}(\theta_{1})$ are fixed points, we can equivalently re-write (117) as

[TABLE]

This is an extension of (115) to the model-based probabilities for all $(\bar{\xi}_{1},\dots,\bar{\xi}_{\bar{v}-1},\alpha)^{\prime}$ in the parameter space, where we note that (117) implies (115) under (111) since $\bar{\pi}_{v}=\pi_{v}^{\star}\left(\theta_{1}^{\ast}\right)$ . Note that if $\left.\pi_{1}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}\neq\left.\pi_{\bar{v}}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}$ , we may suppose (118) without loss of generality. That is, if $\left.\pi_{1}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}>\left.\pi_{\bar{v}}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}$ , we may re-label the indices $v=1,\bar{v}$ to secure ” $<$ ”.

The inequality (117) does not impose any substantive restriction. For example, if $\alpha\geq 0$ and the (marginal) distribution of $W_{1h}^{\prime}\boldsymbol{c}^{\ast}$ is first-order stochastically dominated by that of $W_{\bar{v}h}^{\prime}\boldsymbol{c}^{\ast}$ , then the fixed point solutions satisfy $\left.\pi_{1}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}<\left.\pi_{\bar{v}}^{\star}\left(\theta_{1}\right)\right|_{\boldsymbol{c}=\boldsymbol{c}^{\ast}}$ and thus (117) for any $\bar{\xi}_{1}$ (since $F_{\varepsilon}\left(\cdot\right)$ is strictly increasing), where any restriction on $\Theta_{1}$ (except for the maintained one: $\alpha\geq 0$ ) is imposed.

Now, we are ready to establish the identification properties of $\theta_{1}^{\#}$ and $\theta_{1}^{\ast}$ :

Lemma 1 (Global identification)

*Suppose that Assumption 6 holds.

(a) Further if (i) of Assumption 7 holds, then for any $\theta_{1}^{\#},\theta_{1}\in\Theta_{1}$ ,*

[TABLE]

*for some $v\in\left\{1,\dots,\bar{v}\right\}$ with positive probability, if and only if $\theta_{1}^{\#}\neq\theta_{1}$ , where $\bar{\xi}_{1}^{\#}=\bar{\xi}_{\bar{v}}^{\#}$ and $\bar{\xi}_{1}=\bar{\xi}_{\bar{v}}$ .

(b) Denote by $\theta_{1}^{\ast}=(\boldsymbol{c}^{\ast\prime},\bar{\xi}_{1}^{\ast},\dots,\bar{\xi}_{\bar{v}-1}^{\ast},\alpha^{\ast})^{\prime}$ any element in $\Theta_{1}$ . Further if (ii) - (iv) of Assumption 7 are satisfied, in which (iv) is satisfied with $\boldsymbol{c}^{\diamondsuit}$ of this $\theta_{1}^{\diamondsuit}$ , then for $\theta_{1}\in\Theta_{1}$ ,*

[TABLE]

for some $v\in\left\{1,\dots,\bar{v}\right\}$ with positive probability, if and only if $\theta_{1}^{\ast}\neq\theta_{1}$ , where $\bar{\xi}_{1}^{\ast}=\bar{\xi}_{\bar{v}}^{\ast}$ and $\bar{\xi}_{1}=\bar{\xi}_{\bar{v}}$ .

The result of this lemma allows us to establish (global) identification of $\theta_{1}^{\ast}$ and $\theta_{1}^{\#}$ based on their limit objective functions, $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ and $L^{\mathrm{BR}}\left(\theta_{1}\right)$ . Note that this result does not presuppose the correct specification of model-implied conditional choice probabilities as in (111). However, given (111) with $\theta_{1}^{\ast}$ , our identification analysis based on the objective functions can be done analogous to that for ML estimators in the standard I.I.D. case (as in Lemma 2.2 and Example 1.2 of Newey and McFadden, 1994, pages 2124-2125), which is due to the form of our objective functions, while they are not full ML functions. We summarize the objective-function-based identification result as follows:

Lemma 2

Suppose that $\theta_{1}^{\ast}$ satisfies the conditional expectation restriction (111), and Assumptions 4-7 hold, where (iv) of Assumption 7 holds with $\boldsymbol{c}^{\ast}$ in this $\theta_{1}^{\ast}$ . Then, $\theta_{1}^{\ast}$ is a unique maximizer of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ in $\Theta_{1}$ and it is also a unique maximizer of $L^{\mathrm{BR}}(\theta_{1})$ in $\Theta_{1}$ .

While $\theta_{1}^{\ast}$ and $\theta_{1}^{\#}$ (introduced in (113) and (114), respectively) may differ in general, this lemma states that they are identical if we suppose the correct specification, under which we will identify them and always write $\theta_{1}^{\ast}$ hereafter.

A.4.2 Uniform Convergence Results: Lemmas 3 -

4

In this subsection, we establish uniform convergence for the objective functions using the following conditions:

Assumption 8

*(i) For any $v$ , the support of $W_{vh}$ is included in $S_{W}$ , a bounded subset of $\mathbb{R}^{d_{W}}$ .

(ii) Let $\boldsymbol{h}(\tilde{e}|e,|\tilde{l}-l|_{1};\theta_{2})$ be the conditional probability density of $\varepsilon_{vk}$ given $\left(v,k\right)$ ’s location $L_{vk}=\tilde{l}$ and $\left(v,h\right)$ ’s variables $(L_{vh},\varepsilon_{vh})=(l,e)$ (parametrized by $\theta_{2}\in\Theta_{2}$ ) satisfying*

[TABLE]

where $M_{1},\tau_{1}\in(0,\infty)$ are constants (independent of $e$ and $\theta_{2}$ ); $\tau_{1}>4$ is the same constant introduced in Assumption 4 (the majorant side is defined as [math] if $\tilde{l}=l$ ).

Assumption 8 (ii) can be derived from a spatial analogue of the so-called strong Doeblin condition used in Markov chain theory (see, e.g., Theorem 1 of Holden, 2000), which can be satisfied by various parametric models. It is a strengthening of the alpha-mixing condition in (i) of Assumption 4.

Lemma 3

*Suppose that C1 - C2, **C3-SD, *(i) of Assumption 6, (ii) - (iii) of Assumption 7, Assumption 4 - 8 hold. Then,

[TABLE]

Lemma 4

*Suppose that C1 - C2, **C3-SD, *Assumption 5, (i) of Assumption 6, (ii) - (iii) of Assumption 7, Assumptions 4 - 8 hold. Then, for each $v$ ,

[TABLE]

and

[TABLE]

A.4.3 Proofs of Lemmas 1 - 4

Proof of Lemma 1. The proof of the result (a) is standard and is omitted. Here, we focus on (b). For ease of exposition, we let $\bar{v}=11$ , as in our empirical application and set $\bar{\xi}_{1}=\bar{\xi}_{11}$ . The proof for any other $\bar{v}$ can be done in exactly the same way. We let $\theta_{1}=(\boldsymbol{c}^{\prime},\bar{\xi}_{1},\dots,\bar{\xi}_{10},\alpha)^{\prime}$ and define $\theta_{1}^{\ast}$ analogously. Since $F_{\varepsilon}\left(\cdot\right)$ is strictly increasing, (120) is equivalent to

[TABLE]

with positive probability. We can immediately see that this (122) implies that $\theta_{1}^{\ast}\neq\theta_{1}$ . Now, supposing that $\theta_{1}^{\ast}\neq\theta_{1}$ , we shall derive (122). To this end,, we consider the following five cases: 1) If $\boldsymbol{c}^{\ast}\neq\boldsymbol{c}$ , (122) holds with positive probability by (i) of Assumption 4, regardless of the equality for the other (constant) terms (i.e., $\bar{\xi}_{v}^{\ast}+\alpha^{\ast}\pi_{v}^{\star}(\theta_{1}^{\ast})$ is equal to $\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1})$ or not). 2) If $\boldsymbol{c}^{\ast}=\boldsymbol{c}$ and $\alpha^{\ast}=\alpha=0$ , we must have $(\bar{\xi}_{1}^{\ast},\dots,\bar{\xi}_{10}^{\ast})\neq(\bar{\xi}_{1},\dots,\bar{\xi}_{10})$ , implying (122). 3) If $\boldsymbol{c}^{\ast}=\boldsymbol{c}$ , $\alpha^{\ast}=0$ , $\alpha\neq 0$ , and $(\bar{\xi}_{1}^{\ast},\dots,\bar{\xi}_{10}^{\ast})=(\bar{\xi}_{1},\dots,\bar{\xi}_{10})$ , we must at least have $\pi_{11}^{\star}(\theta_{1})>0$ by (117) of Assumption 7 and thus $\alpha\pi_{11}^{\star}(\theta_{1})\neq 0$ , which implies (122).

For the case with $\boldsymbol{c}^{\ast}=\boldsymbol{c}$ , $\alpha^{\ast}=0$ , $\alpha\neq 0$ , and $(\bar{\xi}_{1}^{\ast},\dots,\bar{\xi}_{10}^{\ast})\neq(\bar{\xi}_{1},\dots,\bar{\xi}_{10})$ , we suppose in contradiction that $\bar{\xi}_{v}^{\ast}=\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1})$ for any $v\in\left\{1,\dots,\bar{v}\right\}$ . Then, $\pi_{1}^{\star}(\theta_{1})=\left(\bar{\xi}_{1}^{\ast}-\bar{\xi}_{1}\right)/\alpha$ and $\pi_{11}^{\star}(\theta_{1})=\left(\bar{\xi}_{1}^{\ast}-\bar{\xi}_{1}\right)/\alpha$ , since $\bar{\xi}_{1}=\bar{\xi}_{11}$ , and thus $\pi_{1}^{\star}(\theta_{1})=\pi_{11}^{\star}(\theta_{1})$ . However, this contradicts (117) of Assumption 7.
Finally, we consider the case with $\boldsymbol{c}^{\ast}=\boldsymbol{c}$ , $\alpha^{\ast}\neq 0$ , and $\alpha\neq 0$ . In this case, by re-parametrizing $\kappa_{v}=\bar{\xi}_{v}+\alpha\pi_{v}$ , the fixed point equations (with respect to $\pi_{v}$ ),

[TABLE]

can be equivalently re-written as equations with respect to $\kappa_{v}$ :

[TABLE]

That is, if $\pi_{v}=\pi_{v}^{\star}\left(\theta_{1}\right)$ is a solution to (123), then $\kappa_{v}^{\star}\left(\theta_{1}\right)=\bar{\xi}_{v}+\alpha\pi_{v}^{\star}\left(\theta_{1}\right)$ is a solution to (124); and if $\kappa_{v}^{\star}\left(\theta_{1}\right)$ solves to (124), then $\pi_{v}^{\star}\left(\theta_{1}\right)=(\kappa_{v}^{\star}\left(\theta_{1}\right)-\bar{\xi}_{v})/\alpha$ solves (123). We can also check the solution uniqueness of (123) is equivalent to that of (124). By this re-parametrization, given $\boldsymbol{c}^{\diamondsuit}=\boldsymbol{c}$ , (122) is

[TABLE]

which we shall show below. Now, to investigate (124), we define the following vector-valued ( $11$ -by- $1$ ) function of $\boldsymbol{\kappa}:=(\kappa_{1},\dots,\kappa_{11})^{\prime}$ and $\boldsymbol{\lambda}=(\bar{\xi}_{1},\dots,\bar{\xi}_{10},\alpha)^{\prime}\in{\textstyle\prod\nolimits_{v=1}^{11}}\left[l_{v},u_{v}\right]$ as

[TABLE]

where

[TABLE]

and the dependence of $\mathbb{K}$ and $K_{v}$ on $\boldsymbol{c}^{\ast}=\boldsymbol{c}$ is suppressed for notational simplicity. Given (116) of Assumption 7, using the contraction mapping theorem: for any $\boldsymbol{\lambda}=(\bar{\xi}_{1},\dots,\bar{\xi}_{10},\alpha)^{\prime}$ , we can find a unique

[TABLE]

Given this function of $\boldsymbol{\lambda}$ , we consider the set of its values:

[TABLE]

Next, we compute the Jacobian matrix of $\mathbb{K}$ with respect to $\boldsymbol{\lambda}=(\bar{\xi}_{1},\dots,\bar{\xi}_{10},\alpha)^{\prime}$ :

[TABLE]

where the upper-left $10$ -by- $10$ submatrix is the identity matrix. This matrix $(\partial/\partial\boldsymbol{\lambda}^{\prime})\mathbb{K}\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ has dominant diagonals for any $\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ in the sense of Gale and Nikaido (1965, p. 84), that is, letting $l_{v}=\int F_{\varepsilon}(w^{\prime}\boldsymbol{c}^{\ast}+\kappa_{v})dF_{W}^{v}(w)$ , whose dependence on $\boldsymbol{c}^{\ast}$ and $\kappa_{v}$ is suppressed for notational simplicity, $(\partial/\partial\boldsymbol{\lambda}^{\prime})\mathbb{K}\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ is said to have dominant diagonals if we can find strictly positive numbers $\left\{\bar{d}_{v}\right\}_{v=1}^{11}$ such that

[TABLE]

If we set $\bar{d}_{v}=1$ for $d=2,\dots,,11$ , then (127) is reduced to

[TABLE]

and it is possible to find some $\bar{d}_{1}\in\left(0,1\right)$ since

[TABLE]

which is imposed in (117) of Assumption 7. Since $(\partial/\partial\boldsymbol{\lambda}^{\prime})\mathbb{K}\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ has dominant diagonals for each $\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ , it is a $P$ -matrix for each $\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ in the sense of Gale and Nikaido (1965, p.84). Applying Gale and Nikaido’s Theorem 4, we can see that for each (fixed) $\boldsymbol{\kappa}\in\mathbb{V}_{\boldsymbol{\kappa}}$ , $\mathbb{K}\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)$ is univalent as a function of $\boldsymbol{\lambda}\in{\textstyle\prod\nolimits_{v=1}^{11}}\left[l_{v},u_{v}\right]$ , i.e., $\mathbb{K}\left(\boldsymbol{\kappa},\boldsymbol{\lambda}\right)=0$ holds only at a unique $\boldsymbol{\lambda}\in{\textstyle\prod\nolimits_{v=1}^{11}}\left[l_{v},u_{v}\right]$ . Therefore, we can define a function $\boldsymbol{\lambda}(\boldsymbol{\kappa})$ on $\mathbb{V}_{\boldsymbol{\kappa}}$ , i.e., the inverse function of $\boldsymbol{\kappa}(\boldsymbol{\lambda})$ introduced in (126). That is, we have shown that $\boldsymbol{\kappa}(\boldsymbol{\lambda})$ is one-to-one (injective; $\boldsymbol{\kappa}(\boldsymbol{\lambda})\neq\boldsymbol{\kappa}(\boldsymbol{\tilde{\lambda}})$ for $\boldsymbol{\lambda}\neq\boldsymbol{\tilde{\lambda}}$ ), implying the desired result (125). We have now completed Case 5) and thus the whole proof.

Proof of Lemma 2. Given the definition of $A_{vh}^{\infty}$ in (112), observe that

[TABLE]

where the first equality follows from the law of iterated expectations and the correct specification assumption and the inequality holds by Jensen’s inequality. By the strict concavity of $\log$ , this inequality holds with equality if and only if $F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}^{\ast}+\bar{\xi}_{v}^{\ast}+\alpha\pi_{v}^{\star}(\theta_{1}^{\ast})\right)=F_{\varepsilon}(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1}))$ , which is equivalent to $\theta_{1}^{\ast}=\theta_{1}$ by (b) of Lemma 1. That is, we have shown that $\theta_{1}^{\ast}$ is the unique maximizer of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ over $\Theta_{1}$ .

To establish the same result for $L^{\mathrm{BR}}\left(\theta_{1}\right)$ , note that $\pi_{v}^{\star}(\theta_{1}^{\ast})$ is the fixed point, and thus the condition (111) (that determines $\theta_{1}^{\ast}$ ) implies

[TABLE]

Therefore,

[TABLE]

meaning that the conditional choice probability model with $\bar{\pi}_{v}$ (instead of $\pi_{v}^{\star}(\theta_{1}^{\ast})$ ) is also correctly specified at $\theta_{1}=\theta_{1}^{\ast}$ . By the same arguments as in (128), we can see that $\theta_{1}^{\ast}$ is also the unique maximizer of $L^{\mathrm{BR}}\left(\theta_{1}\right)$ over $\Theta_{1}$ . The proof is completed.

Proof of Lemma 3. By boundedness of the support of $W_{vh}$ and boundedness of the parameter space $\Theta_{1}$ , $F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1})\right)$ is bounded away from [math] and $1$ uniformly over $\theta_{1}$ , $v$ , and (any realization of) $W_{vh}$ , i.e., we can find some (small) constant $\Delta\in\left(0,1/2\right)$ (independent of $\theta_{1}$ and $v$ ) such that

[TABLE]

Thus, given the globally Lipschitz continuity of $\log\left(\cdot\right)$ on $\left[\Delta,1-\Delta\right]$ , and that of $F_{\varepsilon}\left(\cdot\right)$ and $\pi_{v}^{\star}(\cdot)$ (see the global Lipschitz continuity result (138) in the proof of Lemma 5), as well as the uniform boundedness of $f_{\varepsilon}\left(\cdot\right)$ , we can see that $E\left[A_{vh}^{\infty}\log F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1})\right)\right]$ and $E[(1-A_{vh}^{\infty})\log\left(1-F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\pi_{v}^{\star}(\theta_{1})\right)\right)]$ are also globally Lipshitz continuous in $\theta_{1}$ , implying the global Lipschitz continuity of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ in $\theta_{1}\in\Theta_{1}$ .

Now, replacing $\hat{\pi}_{v}^{\star}(\theta_{1})$ in $\hat{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ by $\pi_{v}^{\star}(\theta_{1})$ , we define the following function:

[TABLE]

Given the uniform convergence of $\hat{\pi}_{v}^{\star}(\theta_{1})$ to $\pi_{v}^{\star}(\theta_{1})$ (Lemma 5), by arguments analogous to those for the global Lipschitz continuity of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ , we can easily see that

[TABLE]

Again, given the global Lipschitz continuity of relevant functions as discussed above, we can also check the stochastic equicontinuity (SE) of $\tilde{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ (by using Corollary 2.2 of Newey, 1991) as well as the (global Lipschitz) continuity of $E\left[\tilde{L}^{\mathrm{FPL}}\left(\theta_{1}\right)\right]$ .

Since $\Theta_{1}$ is assumed to be compact and we have verified the (global Lipschitz) continuity of $L^{\mathrm{FPL}}\left(\theta_{1}\right)$ and the SE of $\tilde{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ , Theorem 2.1 of Newey (1991) implies the uniform convergence:

[TABLE]

if the pointwise convergence holds

[TABLE]

which is to be shown below. And, analogously to the proof of Lemma 7 below, we can obtain

[TABLE]

as its simpler corollary. Then, using this result and arguments quite analogous to the proof of Lemma 4 below, we also have

[TABLE]

implying that

[TABLE]

Then, by (130) and (132), we can obtain the desired conclusion of the lemma. It remains to show the pointwise convergence (131), note that each summand of $\tilde{L}^{\mathrm{FPL}}\left(\theta_{1}\right)$ is a function of $\theta_{1}$ , $W_{vh}$ , and $\boldsymbol{u}_{vh}$ (since $\boldsymbol{u}_{vh}=(u_{v}^{0}\left(L_{vh}\right)),u_{v}^{1}\left(L_{vh}\right))^{\prime}$ and $\varepsilon_{vh}=\varepsilon_{v}(L_{vh})=u_{v}^{1}\left(L_{vh}\right)-u_{v}^{0}\left(L_{vh}\right)$ ). Thus, letting

[TABLE]

which is uniformly bounded since (129) holds, we can apply Lemma 6 to obtain

[TABLE]

where $r_{v}\in\left(0,1\right)$ is the limit of $N_{v}/N$ . This completes the proof.

Proof of Lemma 4. Let

[TABLE]

which is shown to be $o_{p}\left(1\right)$ in Lemma 7. Then, by the definition of $\hat{C}\left(W_{vh},L_{vh};\theta_{1},\theta_{2}\right)$ in (36), we have

[TABLE]

Recall also the definition of $\boldsymbol{H}\left(e\right)=1-F_{\varepsilon}\left(-e\right)$ ( $F_{\varepsilon}$ is the CDF of $-\varepsilon$ ), these lower and upper bounds can be computed as

[TABLE]

Since $F_{\varepsilon}$ is Lipschitz continuous, both the bounds converge to $F_{\varepsilon}\left(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha\pi_{v}^{\star}\left(\theta_{1}\right)\right)$ in probability. Further, the absolute difference of the lower and upper bounds is bounded by $\sup_{z\in\mathbb{R}}f_{\varepsilon}\left(z\right)\times 2\left|\alpha\right|\hat{k}_{N_{v}}$ , implying the uniform convergence of $\hat{C}\left(W_{vh},L_{vh};\theta_{1},\theta_{2}\right)$ as in (121).

A.4.4 Auxiliary Lemmas and their Proofs

Lemma 5

Suppose that C2, (i) of Assumption 6, (ii) - (iii) of Assumption 7, and (i) of Assumption 8 hold. Then,

[TABLE]

Proof of Lemma 5. We below show 1) the pointwise convergence of $\hat{\pi}_{v}^{\star}\left(\theta_{1}\right)$ :

[TABLE]

and 2) the continuity of the limit function $\pi^{\star}\left(\theta_{1}\right)$ and the stochastic equicontinuity of $\hat{\pi}^{\star}\left(\theta_{1}\right)$ . Then, given the compactness of $\Theta_{1}$ (by (ii) of Assumption 7), we have $\sup_{\theta_{1}\in\Theta_{1}}\left|\hat{\pi}_{v}^{\star}\left(\theta_{1}\right)-\pi_{v}^{\star}\left(\theta_{1}\right)\right|=o_{p}\left(1\right)$ (for each $v$ ) by Theorem 2.1 of Newey (1991), which implies the desired result (133) since $v$ is taken over a finite set $\left\{1,\dots,\bar{v}\right\}$ . We below show 1) and 2).

**1) **To show the pointwise convergence, we compute $E\left[|\hat{\pi}_{v}^{\star}(\theta_{1})-\pi_{v}^{\star}\left(\theta_{1}\right)|^{2}\right]$ . To this end, define a functional mapping $g(\in\left(0,1\right))\mapsto T_{\theta_{1}}^{V}\left(g\right)\left(\in\left(0,1\right)\right)$ for each $\left(v,\theta_{1}\right)$ :

[TABLE]

Analogously, we define the following mapping:

[TABLE]

where the (true) CDF $F_{W}^{v}$ in $T_{\theta_{1}}^{v}$ is replaced by the empirical one $\hat{F}_{W}^{v}$ . Since $T_{\theta}^{v}$ and $\hat{T}_{\theta}^{v}$ are contraction (by (iii) of Assumption 7; see also discussions in Appendix A.3), we can find $\pi_{v}^{\star}\left(\theta_{1}\right)$ and $\hat{\pi}_{v}^{\star}\left(\theta_{1}\right)$ , unique fixed points of $T_{\theta_{1}}^{v}$ and $\hat{T}_{\theta_{1}}^{v}$ , respectively, for each $(\theta_{1},v)$ . By the I.I.D.-ness of $\left\{W_{vh}\right\}_{h=1}^{N_{v}}$ in C2,

[TABLE]

where the last inequality holds since $F_{\varepsilon}$ is the CDF and $|F_{\varepsilon}(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha g\left(\theta_{1}\right))-E\left[F_{\varepsilon}(W_{vh}^{\prime}\boldsymbol{c}+\bar{\xi}_{v}+\alpha g\left(\theta_{1}\right))\right]|^{2}$ $\leq 4$ . Therefore, we have shown that

[TABLE]

where the supremum is taken over any $\left[0,1\right]$ -valued function on $\Theta_{1}$ .

Noting that $\hat{\pi}^{\star}\left(\theta_{1}\right)$ and $\pi^{\star}\left(\theta_{1}\right)$ are fixed points, by the triangle inequality, we have

[TABLE]

which, together with (135), implies that

[TABLE]

This implies the desired pointwise convergence (134).

**2) **To verify the continuity of $\pi_{v}^{\star}\left(\theta_{1}\right)$ , observe that for $\theta_{1}\neq\tilde{\theta}_{1}$ ,

[TABLE]

Using the triangle inequality, we have the following upper bound of the last term in the curly braces:

[TABLE]

By combining (136) and (137), we obtain

[TABLE]

Since we can find some $\rho\in\left(-1,1\right)$ such that $\sup_{z}f_{\varepsilon}\left(z\right)\left|\alpha\right|\leq\rho$ for any $\alpha\in\left[l_{v},u_{v}\right]$ (by (ii) of Assumption 7), this inequality leads to

[TABLE]

where $C^{\star}\in\left(0,1\right)$ is some positive constant, whose existence follows from (i) of Assumption 8. That is, we have shown that $\pi^{\star}\left(\theta_{1}\right)$ is (globally Lipschitz) continuous in $\Theta_{1}$ . We can also show that

[TABLE]

where $\hat{C}^{\star}$ is some $O_{p}\left(1\right)$ random variable independent of $\theta_{1},\tilde{\theta}_{1}$ ; note that (139) can be derived in the same way as (138) with $\int\left\|w\right\|dF_{W}^{v}(w)$ replaced by $\int\left\|w\right\|d\hat{F}_{W}^{v}(w)(=O_{p}\left(1\right))$ . This (139) implies the stochastic equicontinuity of $\hat{\pi}^{\star}\left(\cdot\right)$ by Corollary 2.2 of Newey (1991). The proof of Lemma 5 is completed.

Lemma 6

*Suppose that C1, C2, **C3-SD, *and Assumption 4 hold. Then, let $G$ be a function of $\theta_{1}(\in\Theta_{1})$ , $W_{vh}$ , and $\boldsymbol{u}_{vh}$ that is uniformly bounded (and measurable) with

[TABLE]

where $\bar{G}$ is some positive constant (independent of $\theta_{1}$ ). Then, for each $v$ ,

[TABLE]

Proof of Lemma 6. Recall that $\boldsymbol{u}_{vh}=\boldsymbol{u}_{v}\left(L_{vh}\right)$ . Since $\left\{\boldsymbol{u}_{v}\left(\cdot\right)\right\}$ is alpha-mixing, we apply Billingsley’s inequality (Corollary 1.1 of Bosq, 1998) to

[TABLE]

By the so-called conditional-covariance decomposition formula, we have

[TABLE]

The second term on the RHS of (141) is zero since $\left(W_{vh},L_{vh}\right)\perp\left(W_{vk},L_{vk}\right)$ and the conditional expectations are reduced to

[TABLE]

which follow from the conditional independence relation as in (97) (in the proof of Theorem 5). Thus, by the covariance bound given in (140), we have

[TABLE]

uniformly over any $\left(v,h\right)$ and $\left(v,k\right)$ , where the last equality follows from the same arguments as for (108) (in the proof of Theorem 5). Using these, we can compute

[TABLE]

which completes the proof of Lemma 6.

Lemma 7

Suppose that Assumptions 5 and 8 hold. Then, it holds that

[TABLE]

Proof of Lemma 7. Recall that $\hat{\psi}_{v}^{\star}$ is a fixed point of the functional mapping $\mathcal{\hat{F}}_{v,N_{v}}^{\star}$ defined in (38) and $\hat{\pi}_{v}^{\star}$ is a fixed point of

[TABLE]

Note that this $\mathcal{\hat{F}}_{v,\infty}^{\star}$ is a contraction (by Proposition 3) which does not depend on $\theta_{2}$ (the dependence of $\mathcal{\hat{F}}_{v,\infty}^{\star}\left[g\right]\left(\theta_{1},\theta_{2}\right)$ on $\theta_{2}$ is only through that of $g$ ), and its fixed point is also independent of $\theta_{2}$ ; thus, we write $\hat{\pi}_{v}^{\star}\left(\theta_{1}\right)$ (instead of $\hat{\pi}_{v}^{\star}\left(\theta_{1},\theta_{2}\right)$ ). By the triangle inequality,

[TABLE]

where the last inequality holds with some $\rho\in\left(0,1\right)$ (by Proposition 3) that is independent of $(\theta_{1},\theta_{2})$ and any realization of random variables. Thus,

[TABLE]

where the (outer) supremum is taken over any $\left[0,1\right]$ -valued functions. We now show this majorant side is $o_{p}\left(1\right)$ . To this end, observe that

[TABLE]

where the second inequality follows from Assumption 8, and this upper bound is independent of $g$ , $e$ , $\theta_{1}$ , and $\theta_{2}$ . Since $\hat{F}_{L}^{v}$ is the empirical distribution function of the I.I.D. variables $\left\{L_{vk}\right\}_{k=1}^{N_{v}}$ , we have

[TABLE]

By the same arguments as those for (108) (in the proof of Theorem 5), we have

[TABLE]

Therefore,

[TABLE]

which is $o\left(1\right)$ for $\tau_{1}>4$ since $\lambda_{N}=O(\sqrt{N})$ . This completes the proof of Lemma 7.

A.5 Welfare Analysis: The case of $\pi_{1}<\pi_{0}$

Eligibles: Recall eq. (46)

[TABLE]

Now, if

[TABLE]

then each term on the LHS is smaller than the corresponding term on the RHS. If, on the other hand,

[TABLE]

then each term on the LHS is larger than the corresponding term on the RHS. This gives us

[TABLE]

In the intermediate case,

[TABLE]

we have that if $p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)<\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , then

[TABLE]

and if $p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)\geq\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , then

[TABLE]

Putting all of this together, we have that

Proposition 4

Suppose that Assumptions 1, 2 and the linear index structure hold and $\pi_{1}\leq\pi_{0}$ . Then, for each $\alpha_{1}\in\left[0,\alpha\right]$ , if $p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)<\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , then

[TABLE]

and if $p_{1}-p_{0}-\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{1}-\pi_{0}\right)>\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , then

[TABLE]

Ineligibles: Recall eq. (45)

[TABLE]

Now if $a<\min\left\{\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{0}-\pi_{1}\right),\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)\right\}=\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{0}-\pi_{1}\right)$ , then each term on the LHS is smaller than the corresponding term on the RHS for each realization of the $\eta$ s. So the probability is [math]. Similarly, for $a\geq\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{0}-\pi_{1}\right)$ , the probability is 1. Finally, for $\frac{\alpha_{0}}{\beta_{0}}\left(\pi_{1}-\pi_{0}\right)\leq a<\frac{\alpha_{1}}{\beta_{1}}\left(\pi_{0}-\pi_{1}\right)$ , the above inequality is equivalent to

[TABLE]

Thus we have that:

Proposition 5

Suppose that Assumptions 1, 2 and the linear index structure hold and $\pi_{1}\leq\pi_{0}$ . Then, for each $\alpha_{1}\in\left[0,\alpha\right]$ ,

[TABLE]

A.6 Income Endogeneity

(Summarized from Bhattacharya, 2018, Sec 3.1): Observed income may be endogenous with respect to individual choice, e.g. when omitted variables, such as unrecorded education level, can both determine individual choice and be correlated with income. Under such endogeneity, the observed choice probabilities would potentially differ from the structural choice probabilities, and one can define welfare distributions either unconditionally, or conditionally on income, analogous to the average treatment effect and the average effect of treatment on the treated, respectively, in the program evaluation literature. In this context, an important and useful insight, not previously noted, is that for a price-rise, the distribution of the income-conditioned EV is not affected by income endogeneity; for a fall in price, the conclusion holds with CV instead of EV.

To see why that is the case, recall the binary choice setting discussed above, and define the conditional-on-income structural choice probability at income $y^{\prime}$ as

[TABLE]

where $F_{\boldsymbol{\eta}}\left(\cdot|y\right)$ denotes the distribution of the unobserved heterogeneity $\boldsymbol{\eta}$ for individuals whose realized income is $y$ , where $y$ may or may not equal $y^{\prime}$ . Now, given a price rise from $p_{0}$ to $p_{1}$ , for a real number $a$ , satisfying $0\leq a<p_{1}-p_{0}$ , the distribution of equivalent variation (analogous to compensating variation for a fall in price as in a subsidy) at $a$ , evaluated at income $y$ , conditional on realized income being $y$ , is given by (see Bhattacharya, 2015)

[TABLE]

Now, $q_{1}^{c}\left(p_{0}+a,y,y\right)$ , by definition, is the fraction of individuals currently at income $Y=y$ who would choose alternative 1 at price $p_{0}+a$ , had their income been $y$ . Now if prices are exogenous in the sense that $P\perp\boldsymbol{\eta}|Y$ , then the observable choice probability conditional on price $p$ and income $y$ is given by

[TABLE]

Therefore, (154) equals $q_{1}\left(p_{0}+a,y\right)$ , so no corrections are required owing to endogeneity. This implies that if exogeneity of income is suspect and no obvious instrument or control function is available, then a researcher can still perform meaningful welfare analysis based on the EV distribution at realized income, provided price is exogenous conditional on income and other observed covariates. For a fall in price, as induced by a subsidy, the same conclusion holds for the compensating variation which we have calculated in our application. Furthermore, one can calculate aggregate welfare in the population by integrating $q_{1}\left(p\mathbf{,}y\right)=q_{1}^{c}\left(p,y,y\right)$ over the marginal distribution of income.

A.7 Nonparticipating Households

We note that in our field experiment conducted over eleven villages in West Kenya, a subset of households in each village is participating in the game, and our sample does not cover all village members. This might potentially cause a problem since selected households might interact with non-selected ones but we do not have any data about the latter. However, at the time of the experiment, non-selected households did not have any opportunity to buy an ITN and the outcome variables $A$ for such households are always zero, whose conditional expectations are zero as well. Thus, in our specification, even if we allow for interactions among all the village members (who are selected or non-selected by us), it is easy to do the necessary adjustments in the empirics.

To see this point, we interpret the index $(v,k)$ as representing any of selected and non-selected households, i.e., $k\in\{1,\dots,\check{N}_{v}\}$ where $\check{N}_{v}$ is the number of all households in village $v$ (thus, $\check{N}_{v}>N_{v}$ ), and define $\check{A}_{vk}$ as a variable to denote the outcome of any village members, i.e., if $\left(v,k\right)$ is selected in the experiment, $\check{A}_{vk}=A_{vk}$ and otherwise $=0$ . Corresponding to $\check{A}_{vk}$ , let $\check{\Pi}_{vh}$ be $\left(v,h\right)$ ’s belief defined as

[TABLE]

which is the average of the conditional expectations over all the households in village $v$ . By the definition of $\check{A}_{vk}$ , we can easily see that

[TABLE]

which is a scaled version of $\Pi_{vh}$ . Even if $(v,h)$ ’s behavior is affected by non-selected households, i.e., it is determined by (1) but with $\Pi_{vh}$ being replaced by $\check{\Pi}_{vh}$ , its difference from the previous case is only the scaling by $(\tfrac{N_{v}-1}{\check{N}_{v}-1})$ . In our empirical setting, this ratio is $0.8$ , and we apply this adjustment throughout the analysis.

References for the Appendix

Gale, D. & Nikaido, H. (1965) The Jacobian matrix and global univalence of mappings, Mathematische Annalen 159, 81-93.

Holden, L. (2000) Convergence of Markov chains in the relative supremum norm. Journal of Applied Probability 37, 1074-1083.

Jenish, N. & Prucha, I.R. (2012) On spatial processes and asymptotic inference under near-epoch dependence. Journal of Econometrics 170, 178-190.

Lee, L.-F. (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models, Econometrica 72, 1899-1925.

Newey, W.K. (1991) Uniform convergence in probability and stochastic equicontinuity, Econometrica 59, 1161-1167.

Newey, W.K. & McFadden, D. (1994) Large sample estimation and hypothesis testing, Handbook of Econometrics, Vol. IV (Ed. R.F. Engle and D.L. McFadden), Ch. 36, pages 2111-2245, Elsevier.

Stokey, N.L. & Lucas, Robert E. Jr. (1989) Recursive Methods in Economic Dynamics, Harvard University Press.

Varin, C., Reid, N., & Firth, D. (2011) An overview of composite likelihood methods, Statistica Sinica 21, 5-42.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Andrews, D.W. (2005) Cross-section regression with common shocks. Econometrica 73, 1551-1585.
2[2] Bhattacharya, D. (2008) A Permutation-based estimator for monotone index models. Econometric Theory 24, 795-807.
3[3] Bhattacharya, D. (2015) Nonparametric welfare analysis for discrete choice. Econometrica 83, 617-649.
4[4] Bhattacharya, D. (2018) Empirical welfare analysis for discrete choice: Some general results. Quantitative Economics 9, 571-615.
5[5] Blundell, R. and J. Powell (2004). Endogeneity in nonparametric and semiparametric regression Models, in Advances in Economics and Econometrics, Cambridge University Press, Cambridge, U.K.
6[6] Brock, W.A. & Durlauf, S.N. (2001 a). Discrete choice with social spillover. Review of Economic Studies 68, 235-60.
7[7] Brock, W.A. & Durlauf, S.N. (2001 b) Interactions-based models. Handbook of econometrics (Vol. 5, pp. 3297-3380). Elsevier.
8[8] Brock, W.A. & Durlauf, S.N. (2007) Identification of binary choice models with social interactions. Journal of Econometrics 140, 52-75.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Demand and Welfare Analysis in Discrete Choice Models with Social

Abstract

1 Introduction

2 Set-up and Assumptions

2.1 Equilibrium Beliefs

2.1.1 Constant and Symmetric Beliefs under the (Conditional) I.I.D.

Proposition 1

Proposition 2

2.1.2 Convergence of Beliefs under Spatial Dependence

Theorem 1

3 Econometric Specification and Estimators

3.1 Econometric Estimators

3.2 Convergence of the Estimators

Theorem 2

4 Welfare Analysis

Assumption 1

Assumption 2

4.1 Welfare for Eligibles

Remark 1

Theorem 3

Remark 2

Corollary 1

4.2 Welfare for Ineligibles

Theorem 4

Corollary 2

4.3 Deadweight Loss

4.4 Calculation of Predicted Demand and

5 Empirical Context and Data

6 Empirical Specification and

7 Summary and Conclusion

Appendix A Appendix

A.1 Proofs for the (Conditionally) I.I.D. Case

A.2 The Spatially Dependent Case

A.2.1 Spatially Mixing Structure

Assumption 3

Assumption 4

A.2.2 Convergence of Equilibrium Beliefs

Theorem 5** (Convergence of beliefs under spatial correlation)**

A.3 Sufficient Conditions for Contraction

Assumption 5

Proposition 3

A.4 Proof of Theorem 2 (the Estimators’

A.4.1 Identification Results: Lemmas 1 -

Assumption 6

Assumption 7

Lemma 1** (Global identification)**

Lemma 2

A.4.2 Uniform Convergence Results: Lemmas 3 -

Assumption 8

Lemma 3

Lemma 4

A.4.3 Proofs of Lemmas 1 - 4

A.4.4 Auxiliary Lemmas and their Proofs

Lemma 5

Lemma 6

Lemma 7

A.5 Welfare Analysis: The case of π1<π0\pi_{1}<\pi_{0}π1​<π0​

Proposition 4

Proposition 5

A.6 Income Endogeneity

A.7 Nonparticipating Households

References for the Appendix

Theorem 5 (Convergence of beliefs under spatial correlation)

Lemma 1 (Global identification)

A.5 Welfare Analysis: The case of $\pi_{1}<\pi_{0}$