Demand and Welfare Analysis in Discrete Choice Models with Social Interactions
Debopam Bhattacharya, Pascaline Dupas, Shin Kanaya

TL;DR
This paper introduces new empirical tools to analyze demand and welfare effects of policies in binary choice models with social interactions, highlighting the importance of underlying mechanisms and providing bounds on welfare impacts.
Contribution
It connects large game econometrics with social interaction models, develops convergence results, and shows limitations of choice data for welfare analysis despite unique equilibria.
Findings
Choice data are insufficient for welfare calculations under social interactions.
Distribution-free bounds on welfare can be derived using index restrictions.
Experimental data on mosquito-net adoption illustrate the theoretical results.
Abstract
Many real-life settings of consumer-choice involve social interactions, causing targeted policies to have spillover-effects. This paper develops novel empirical tools for analyzing demand and welfare-effects of policy-interventions in binary choice settings with social interactions. Examples include subsidies for health-product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference-parameters under increasing-domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand-prediction under interactions, are insufficient for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic and Environmental Valuation · Economics of Agriculture and Food Markets · Regional Economics and Spatial Analysis
Demand and Welfare Analysis in Discrete Choice Models with Social
Interactions††thanks: We are grateful to Steven Durlauf, James Heckman, Xenia Matschke, GautamTripathi, and seminar participtants at the University of Chicago and the University of Luxembourg for helpful feedback. Bhattacharya acknowledges financial support from the ERC consolidator grant EDWEL; the first outline of this project appeared as part b.3 of that research proposal of March 2015. Part of this research was conducted while Kanaya was visiting the Institute of Economic Research, Kyoto University (under the Joint Research Program of the KIER), the support and hospitality of which are gratefully acknowledged.
Debopam Bhattacharya
University of Cambridge Address for correspondence: Faculty of Economics, University of Cambridge, CB3 9DD. Phone (+44)7503858289, email: [email protected]
Pascaline Dupas
Stanford University
Shin Kanaya
University of Aarhus
(26 April 2019.)
Abstract
Many real-life settings of consumer-choice involve social interactions, causing targeted policies to have spillover-effects. This paper develops novel empirical tools for analyzing demand and welfare-effects of policy-interventions in binary choice settings with social interactions. Examples include subsidies for health-product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference-parameters under increasing-domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand-prediction under interactions, are insufficient for welfare-calculations. This is because distinct underlying mechanisms producing the same interaction coefficient can imply different welfare-effects and deadweight-loss from a policy-intervention. Standard index-restrictions imply distribution-free bounds on welfare. We illustrate our results using experimental data on mosquito-net adoption in rural Kenya.
1 Introduction
Social interaction models – where an individual’s payoff from an action depends on the perceived fraction of her peers choosing the same action – feature prominently in economic and sociological research. In this paper, we address a substantively important issue that has received limited attention within these literatures, viz. how to conduct economic policy evaluation in such settings. In particular, we focus on welfare analysis of policy interventions in binary choice scenarios with social interactions. Examples include subsidies for adopting a health-product and merit-based vouchers for attending a high-achieving school, where the welfare gain of beneficiaries may be accompanied by spillover-led welfare effects on those unable to adopt or move, respectively. Ex-ante welfare analysis of policies is ubiquitous in economic applications, and informs the practical decision of whether to implement the policy in question. Furthermore, common public interventions such as taxes and subsidies are often motivated by efficiency losses resulting from externalities. Therefore, it is important to develop empirical methods for welfare analysis in presence of such externalities, which cannot be done using available tools in the literature. Developing such methods and making them practically relevant also requires one to clarify and extend some aspects of existing empirical models of social interaction.
Literature Review and Contributions: Seminal contributions to the econometrics of social interactions include Manski (1993) for continuous outcomes, and Brock and Durlauf (2001a) for binary outcomes. More recently, there has been a surge of research on the related theme of network models, c.f. de Paula (2016). On the other hand, the econometric analysis of welfare in standard discrete choice settings, i.e. with heterogeneous consumers but without social spillover, started with Domencich and McFadden (1977), with later contributions by Daly and Zachary (1978), Small and Rosen (1981), and Bhattacharya (2018). The present paper builds on these two separate literatures to examine how social interactions influence welfare effects of policy-interventions and the identifiability of such welfare effects from standard choice data. In the context of binary choice with social interactions, Brock and Durlauf (2001a, Sec 3.3) discussed how to rank different possible equilibria resulting from policy interventions in terms of social utility – as opposed to individual welfare. They used log-sum type formulae, as in Small and Rosen (1981), to calculate the average indirect utility for specific realized values of covariates and average peer choice. Such calculations are not directly useful for our purpose. This is because the aggregate income transfer that restores average social utility to its pre-intervention level does not equal the average of individual compensating variations that restore individual utilities to their pre-intervention level. The latter is related to the concept of average deadweight loss, i.e. the efficiency cost of interventions, and consequently has received the most attention in the recent literature on empirical welfare analysis, c.f. Hausman and Newey (2016), Bhattacharya (2015), McFadden and Train (2019), and it is this notion of individual welfare that we are interested in. However, in settings involving spillover, we cannot use the methods of the above papers, as they do not allow for individual utilities to be affected by aggregate choices – a feature that has fundamental implications for welfare analysis. Therefore, new methods are required for welfare calculations under spillover, which we develop in the present paper.
In order to develop these methods, one must first have a theoretically coherent utility-based framework where many individuals interact with each other, i.e. provide a micro-foundation for Brock-Durlauf type models in terms of an empirical game with many players. This is necessary because welfare effects are defined with respect to utilities, and therefore, one has to specify the structure of individual preferences and beliefs including unobserved heterogeneity, and how they interact to produce the aggregate choice in equilibrium before and after the policy intervention. This requires clarifying the information structure and nature of the corresponding Bayes-Nash equilibria. A pertinent issue here is modelling the dependence structure of utility-relevant variables unobservable to the analyst but observable to the individual players. In particular, spatial correlation in unobservables – natural in the commonly analyzed setting where peer-groups are physical neighborhoods – makes individual beliefs conditional on one’s own privately observed variables which contain information about neighborhood ones. This complicates identification and inference. The first main contribution of the present paper is to establish conditions under which this feature of beliefs can be ignored ‘in the limit’, and one can proceed as if one is in an I.I.D. setting. This derivation is much more involved than the well-known result that in linear regression models, the OLS is consistent under correlated unobservables. In particular, our result involves showing that the fixed points of certain functional maps converge, under increasing domain and weak dependence asymptotics for spatial data, to fixed points of a limiting map, implying convergence of conditional beliefs to unconditional ones. This, in turn, is shown to imply convergence of complicated estimators of preference parameters under conditional beliefs to computationally simple ones in the limit. These estimators then yield consistent, counterfactual demand-prediction corresponding to a policy-intervention.
The standard setting in the game estimation literature is one where many independent markets are observed, each with a small number of players. Here, we consider estimation of preference parameters from data on a few markets with many players in each, using asymptotic approximations where the number of players tends to infinity but number of markets remains fixed. In this setting, if the forms of equilibrium beliefs is symmetric among players,111Symmetry means that (1) if the beliefs are unconditional expectations – as is the case with I.I.D. unobervables – they are identical across players, (2) if they are conditional expectations – as is the case for spatially correlated unobservables – their functional forms are identical. the probabilistic laws that they follow have a certain homogeneity across players. Due to this homogeneity, asymptotics on the number of players provides the ‘repeated observations’ required to identify the players’ preference parameters. Menzel (2016) had also analyzed identification and estimation in games with many players. Below, we provide more discussion on the relation and differences between our analysis and Menzel’s.
Welfare Analysis: The second part of our paper concerns welfare-analysis of policy-interventions, e.g. a price-subsidy, in a setting with social interactions. Here we show that unlike counterfactual demand estimation, welfare effects are generically not identified from choice data under interactions, even when utilities and the distribution of unobserved heterogeneity are parametrically specified, equilibrium is unique, and there are no endogeneity concerns. To understand the heuristics behind under-identification, consider the empirical example of evaluating the welfare effect of subsidizing an anti-malarial mosquito net. Suppose, under suitable restrictions, we can model choice behavior in this setting via a Brock-Durlauf type social interaction model, and the data can identify the coefficient on the social interaction term. However, this coefficient may reflect an aggregate effect of (at least) two distinct mechanisms, viz. (a) a social preference for conforming, and (b) a health-concern led desire to protect oneself from mosquitoes deflected from neighbors who adopt a bednet. These two distinct mechanisms, with different magnitudes in general, would both make the social interaction coefficient positive, and are not separately identifiable from choice data (only their sum is). But they have different implications for welfare if, say, a subsidy is introduced. At one extreme, if all spillover is due to preference for social conforming, then as more neighbours buy, a household that buys would experience an additional rise in utility (over and above the gain due to price reduction), but a non-buyer loses no utility via the health channel. At the other extreme, if spillover is solely due to perceived negative health externality of buyers on non-buyers, then increased purchase by neighbours would lower the utility of a household upon not buying via the health-route, but not affect it upon buying since the household is then protected anyway. These different aggregate welfare effects are both consistent with the same positive aggregate social interaction coefficient. This conclusion continues to hold even if eligibility for the subsidy is universal, there are no income effects or endogeneity concerns, and whether or not unobservables in individual preferences are I.I.D. or spatially correlated.
Indeed, this feature is present in many other choice situations that economists routinely study. For example, consider school-choice in a neighborhood with a free, resource-poor local school and a selective, fee-paying resource-rich school. In this setting, a merit-based voucher scheme for attending the high-achieving school can potentially have a range of possible welfare effects. Aggregate welfare change could be negative if, for example, with high-ability children moving with the voucher the academic quality declines in the resource-poor school more than the improvement in the selective school via peer-effects. In the absence of such negative externalities, aggregate welfare could be positive due to the subsidy-led price decline for voucher users and any positive conforming effects that raise the utility of attending the rich school when more children also do so. These contradictory welfare implications is compatible with the same positive coefficient on the social interaction term in an individual school-choice model.
For standard discrete choice without spillover, Bhattacharya (2015) showed that the choice probability function itself contains all the information required for exact welfare analysis. In particular, for the special case of quasi-linear random utility models with extreme value additive errors, the popular ‘logsum’ formula of Small and Rosen (1981) yields average welfare of policy interventions. These results fail to hold in a setting with spillovers because here one cannot set the utility from the outside option to zero – an innocuous normalization in standard discrete choice models – since this utility changes as the equilibrium choice-rate changes with the policy-intervention. This is in contrast to binary choice without spillover, where utility from the outside option, i.e. non-purchase, does not change due to a price change of the inside good.
Nonetheless, under a standard, linear-index specification of demand, one can calculate distribution-free bounds on average welfare, based solely on choice probability functions. The width of the bounds increases with (i) the extent of net social spillover, i.e. how much the (belief about) average neighborhood choice affects individual choice probabilities, and (ii) the difference in average peer-choice corresponding to realized equilibria before and after the price-change. The index structure, which has been universally used in the empirical literature on social interactions (c.f. Brock and Durlauf, 2001a, 2007), leads to dimension reduction that plays an important role in identifying spillover effects. We therefore continue to use the index structure as it simplifies our expressions, and comes “for free”, because social spillovers cannot in general be identified without such structure anyway. Under stronger and untestable restrictions on the nature of spillover, our bounds can shrink to a singleton, implying point-identification of welfare. Two such restrictions are (a) the effects of an increase in average peer-choice on individual utilities from buying and not buying are exactly equal in magnitude and opposite in sign, or (b) the effect of aggregate choice on either the purchase utility or the non-purchase utility is zero.
Empirical Illustration: We illustrate our theoretical results with an empirical example of a hypothetical, targeted public subsidy scheme for anti-malarial bednets. In particular, we use micro-data from a pricing experiment in rural Kenya (Dupas, 2014) to estimate an econometric model of demand for bednets, where spillover can arise via different channels, including a preference for conformity and perceived negative externality arising from neighbors’ use of a bednet. In this setting, we calculate predicted effects of hypothetical income-contingent subsidies on bednet demand and welfare. We perform these calculations by first accounting for social interactions, and then compare these results with what would be obtained if one had ignored these interactions. We find that allowing for (positive) interaction leads to a prediction of lower demand when means-tested eligibility is restricted to fewer households and higher demand when the eligibility criterion is more lenient, relative to ignoring interactions. The intuitive explanation is that ignoring a covariate with positive impact on the outcome would lead to under-prediction if the prediction point for the ignored covariate is higher than its mean value. As for welfare, allowing for social interactions may lead to a welfare loss for ineligible households, in turn implying higher deadweight loss from the subsidy scheme, relative to estimates obtained ignoring social spillover where welfare effects for ineligibles are zero by definition. The resulting net welfare effect, aggregated over both eligibles and ineligibles, admits a large range of possible values including both positive and negative ones, with associated large variation in the implied deadweight loss estimates, all of which are consistent with the same coefficient on the social interaction term in the choice probability function.
An implication of these results for applied work is that welfare analysis under spillover effects requires knowledge of the different channels of spillover separately, possibly via conducting a ‘belief-elicitation’ survey; knowledge of only the choice probability functions, inclusive of a social interaction term, is insufficient.
Plan of the Paper: The rest of the paper is organized as follows. Section 2 describes the set-up, and establishes the formal connection between econometric analysis of large games and Brock-Durlauf type social interaction models for discrete choice, first under I.I.D. and then under spatially correlated unobservables. This section contains the key results on convergence of conditional (on unobservables) beliefs in the spatial case to non-stochastic ones under an increasing domain asymptotics. Section 3 shows consistency of our preferred, computationally simple estimator even under spatial dependence, Section 4 develops the tools for empirical welfare analysis of a price intervention – such as a means-tested subsidy – in such models, and associated deadweight loss calculations. In Section 5, we lay out the context of our empirical application, and in Section 6 we describe the empirical results obtained by applying the theory to the data. Finally, Section 7 summarizes and concludes the paper. Technical derivations, formal proofs and additional results are collected in an Appendix.
2 Set-up and Assumptions
Consider a population of villages indexed by and resident households in village indexed by , with . For the purpose of inference discussed later, we will think of these households as a random sample drawn from an infinite superpopulation. The total number of households we observe is . Each household faces a binary choice between buying one unit of an indivisible good (alternative 1) or not buying it (alternative 0). Its utilities from the two choices are given by and where the variables , , and denote respectively the income, price, and heterogeneity of household , and is household ’s subjective belief of what fraction of households in her village would choose alternative . The variable is privately observed by household but is unobserved by the econometrician and other households. The dependence of utilities on captures social interactions. Below, we will specify how is formed. Household ’s choice is described by
[TABLE]
where denotes the indicator function. In the mosquito-net example of our application, one can interpret and as expected utilities resulting from differential probabilities of contracting malaria from using and not using the net, respectively.
The utilities, and , may also depend on other covariates of . For notational simplicity, we let , and suppress other covariates for now; covariates are considered in our empirical implementation in Section 6.
For later use, we also introduce a set of location variables : where denotes ’s (GPS) location.
**Incomplete-Information Setting: **In each village , each of the households is provided the opportunity to buy the product at a researcher-specified price randomly varied across households. These households will be termed as players from now on. Players have incomplete information in that each player knows her own variables . We assume, in line with our application context, that a player does not know the identities of all the players who have been selected in the experiment and thus their variables and choice (for any and ). Accordingly, we model interactions of households as an incomplete-information Bayesian game, whose probabilistic structure is as follows.
We consider two sources of randomness: one stemming from random drawing of households from a superpopulation, and the other associated with the realization of players’ unobserved heterogeneity . This will be further elaborated below.
We assume players have ‘rational expectations’ in accordance with the standard Bayes-Nash setting, i.e., each ’s belief is formed as
[TABLE]
where is the conditional expectation computed through the probability law that governs all the relevant variables given ’s information set that includes . Here, ‘rational expectation’ simply means that subjective and physical laws of all relevant variables coincide. The explicit form of (2) in equilibrium is investigated in the next subsection after we have specified the probabilistic structure for all the variables.
Each player is solely concerned with behavior of other players in the same village. In this sense, the econometrician observes games ( is eleven in our empirical study), each with ‘many’ players. To formalize our model as a Bayesian game in each village, given the form of (2), and would be interpreted as expected utilities. This is possible when the underlying vNM utility indices and satisfy
[TABLE]
i.e., is linear in the second argument; and satisfy an analogous relationship. This will hold in particular when utilities have a linear index structure, as in Manski (1993) and Brock and Durlauf (2001a, 2007).
**Dependence Structure of Unobserved Heterogeneity: **We assume that unobserved heterogeneity () takes the following form:
[TABLE]
where stands for a village-specific factor that is common to all members in the th village and represents an individual specific variable. Below we will consider two different specifications for the sequence : for each , given , viz., (1) are conditionally independent and identically distributed, and (2) is spatially dependent.222The “fixed-effect” type specification (3) is similar to Brock and Durlauf (2007). However, the additive separable structure of (3) is assumed here for expositional simplicity; we can allow for for some possibly nonlinear function , and this general form does not change anything substantive in what follows. We assume that the value of is commonly known to all members in village but is a purely private variable known only to individual . Neither nor is observable to the econometrician. We also assume that this information structure as well as the probabilistic structure of variables imposed below (c.f. conditions C1, C2, and C3 with I.I.D. or SD below) is known to all the players in the game. Given our settings so far, we can specify the form of player ’s information set as
[TABLE]
In our empirical set-up, the group level unobservables will be identified using the fact that there are many households per village.
Having described the set-up through equations: (1), (2), (3), and (4), we now close our model by providing the following conditions on the probabilistic law for the key variables:
C1
, , are independent across .
Assumption C1 says that variables in village are independent of those in village .
C2
For each , given , is I.I.D. with , the conditional CDF for village .
This conditional I.I.D.-ness of C2 for observables represents randomness associated with sampling of households in our field experiment. Additionally, the household is assumed to know the distribution .
For the distribution of unobservable heterogeneity, we consider two alternative scenarios:
C3-IID
(i) For each , given , the sequence is conditionally I.I.D., with ; (ii) is independent of conditionally on .
C3-SD
For each , the sequence defined as
[TABLE]
for a stochastic process , indexed by location , where are independent of for , and satisfy the following properties: (i) for each , is an alpha-mixing stochastic process conditionally on , where the definition of an alpha-mixing process is provided in Appendix A.2; (ii) is independent of conditionally on .
The conditional I.I.D.-ness imposed in C3-IID (i) leads to equi-dependence within each village, i.e., for any and . In contrast, C3-SD (i) allows for non-uniform dependence that may vary depending on the relative locations of the two players, i.e., if two households and selected in the experiment with locations and , respectively, live close to each other (i.e., is small), and (and thus and ) are more correlated. For example, in our application on mosquito-net adoption, this can correspond to positive spatial correlation in density of mosquitoes, unobserved by the researcher. Assumption C3-SD is consistent with the “increasing domain” type asymptotic framework used for spatial data, formally set out in Appendix A.2 of this paper (briefly, the area of tends to as ; c.f. Lahiri, 2003, Lahiri and Zhu, 2006).
For the purpose of inference, C3-SD may be seen as a generalization of C3-IID, but in our Bayes-Nash framework with many players, they will, in general, imply substantively different forms for beliefs and equilibria. In particular, under C3-IID, each player ’s unobservables is not useful for predicting another player ’s variables and behavior, and therefore her belief – defined in (2) as the average of the conditional expectations about all the others’ – is reduced to the average of the *unconditional *expectations (as formally shown in Proposition 1) below. On the other hand, under the spatial dependence scheme C3-SD, since and are correlated, knowing one’s own realized value of can help predict others’ ; in other words, ’s own information is useful for forming beliefs about others.
Condition (ii) in C3 (with I.I.D. or SD) is the exogeneity condition. Since is independent of conditionally on , we have . This allows for identification and consistent estimation of model parameters. In the context of the field experiment in our empirical exercise, this exogeneity condition can be interpreted as saying that realization of unobserved heterogeneity is independent of how researchers have selected the sample. Note that the exogeneity condition is conditional on (and ), and it does not exclude correlation of and in the unconditional sense. Say, if is well predicted by location (say, there are high-income districts and low-income ones, and no restriction is imposed on the joint distribution of ), we can still capture situations where tends to be higher for ’s income since .333In our application, prices are randomly assigned to individuals by researchers and thus and are independent both unconditionally and conditionally on .
**Two Sources of Randomness: **The above probabilistic framework with two sources of randomness has parallels in Andrews (2005, Section 7) and Lahiri and Zhu (2006). It is also related to Menzel’s (2016) framework with exchangeable variables (below we provide further comparison of our framework with Menzel’s). As stated, C2 represents randomness induced by the researchers’ experimental process. In contrast, the specification in C3 represents randomness of unobserved heterogeneity conditionally on , the (locations of) households selected in the experiment.
Conditions C2 and C3-IID imply that are I.I.D. conditionally on , and thus our framework can be interpreted as the standard one with a single source of randomness. For the spatial case C3-SD, the beliefs depend on , and in particular, on the unobservable (to the econometrician) , which complicates identification and inference. We get around this complication by showing that under an “increasing domain” type of asymptotics for spatial data, reasonable in our application, the model and estimates of its parameters under C3-SD converge essentially to the simpler model C3-IID, and this justifies the use of Brock-Durlauf type analysis even under spatial dependence.
2.1 Equilibrium Beliefs
In this subsection, we investigate the forms of players’ beliefs defined in (2) first in the I.I.D. and then in the spatially dependent case. We first consider the case of C3-IID. This case corresponds to Brock and Durlauf’s (2001a) binary choice model with social interactions where, additionally, unobserved heterogeneity was modelled through the logistic distribution. BD01 made an intuitive, but somewhat ad hoc, assumption that beliefs, corresponding to our , are constant and symmetric across all players in the same village. We first show that under C3-IID, this assumption can be justified in our incomplete-information game setting via the specification of a Bayes-Nash equilibrium. We next consider the spatially dependent case with C3-SD. As briefly discussed above, beliefs under the spatial dependence have to be computed through conditional expectations. However, under an “increasing domain” asymptotic framework for spatial data, conditional-expectation based beliefs converge to the beliefs in the I.I.D. case. The mathematical derivation of this result is somewhat involved; so in the main text we outline the key points, and provide the formal derivation in the Appendix.
2.1.1 Constant and Symmetric Beliefs under the (Conditional) I.I.D.
Setting
We investigate the forms of beliefs under C3-IID through the two following propositions:
Proposition 1
Suppose that Conditions C1, C2, and C3-IID are common knowledge in the Bayesian game described in the previous section. Then, for any in village with ,
[TABLE]
where defined in (4).
The proof of Proposition 1 is provided in Appendix A.1. Note that this proposition does not utilize any equilibrium condition. It simply confirms, formally, the intuitive statement that ’s own variables are not useful to predict other ’s behavior . Given this result, we can write the belief (defined in (2)) as
[TABLE]
where
[TABLE]
and is a function of and independent of -specific variables, , while the functional form of may depend on the index in a deterministic way; for notational simplicity, we suppress the dependence of on below.
Beliefs in equilibrium solve the system of equations:
[TABLE]
where denotes the conditional expectation operator given (i.e., ). Brock and Durlauf (2001a) focus on equilibria with constant and symmetric beliefs.444The constancy of beliefs means that each player’s belief is independent of any realization of her own, player-specific variables as in (77). Using our notation above, we say that (constant) beliefs are symmetric when for any (for each ). When Brock and Durlauf’s framework is interpreted as a Bayesian game, one can formally justify their focus on constant and symmetric beliefs under conditions laid out in Proposition 2 below.
To establish this proposition, define for each , given , a function as
[TABLE]
for notational economy, we will often suppress the dependence of on ; but note that is independent of individual index under the conditional I.I.D. assumption given . Now we are ready to provide the following characterization of beliefs:
Proposition 2
Suppose that the same conditions hold as in Proposition 1 and the function defined in (8) is a contraction, i.e., for some ,
[TABLE]
Then, a solution of the system of equations in (7) uniquely exists and is given by symmetric beliefs, i.e.,
[TABLE]
The proof is given in the Appendix. Propositions 1-2 show that, given the (conditional) I.I.D. and contraction conditions, the equilibrium is characterized through
[TABLE]
for some constant within each village (given ). This implies that the beliefs can be consistently estimated by the sample average of over village , which is exploited in our empirical study.
The contraction condition (9) can be verified on a case by case basis. In particular, for the linear index model used below, the condition is
[TABLE]
where denotes the coefficient on beliefs, i.e. the social interaction term, and denotes the density of , the unobservable determinant of choosing option (defined below through or ). In a probit specification in which is the standard normal, and thus we require and for the logit specification, , and thus . We verify that these conditions are satisfied in our application.
Note, however, from the proof of Proposition 2, that the contraction condition (9) is not necessarily required for uniqueness. That is, if a solution to the system of equations (7) is unique and defined in (8) has a unique fixed point (i.e., a solution to is unique), then the same conclusion still holds. We have imposed (9) since it is a convenient sufficient condition that guarantees uniqueness both in (7) and ; it also appears to be a mild condition, and easy to verify in applications.
2.1.2 Convergence of Beliefs under Spatial Dependence
In this subsection, we provide a formal characterization of beliefs in equilibrium under the spatial case C3-SD. When the unobserved heterogeneity are dependent, beliefs in equilibrium may not reduce to a constant within each village, unlike in Proposition 1. With correlated and , the conditional expectation is in general a function of the privately observed , because knowing is useful for predicting and thus (the latter is a function of ). While ’s beliefs are given by a constant under C3-IID, they will in general be a function of ’s variables unobserved by the researcher, when spatial dependence is allowed, thereby complicating the analysis. In this subsection, we investigate formal conditions under which this feature of beliefs disappears “in the limit”.555Yang and Lee, 2017 discuss estimation of a social interaction model with heterogeneous beliefs, but the heterogeneity is solely a function of observed player-specific variables (c.f. Eqn 2.1 in Yang and Lee, 2017), while unobserved private variables are IID, and not spatially correlated as in our case.
**Asymptotic Framework for Spatial Data: **Under spatial dependence, the first key condition enabling consistent estimation of our model parameters is the spatial analog of weak dependence. This amounts to specifying that and are less dependent when the distance between and , , is large. The notion of asymptotics we use is the so-called “increasing domain” type (c.f. Lahiri, 1996), where the area from which is sampled expands to infinity as . In particular, for each player , the number of other players who are almost uncorrelated with expands to , and the ratio of such players (relative to all players) tends to . Given this, and assuming that any bounded region in the support of does not contain too many observations (even when tends to ), we can (i) ignore the effect of spatial dependence on equilibrium beliefs “in the limit”, and (ii) derive limit results for spatial data (e.g., the laws of large numbers and central limit theorems as in Lahiri, 1996, 2003), and use these to develop an asymptotic inference procedure.
In our empirical set-up, the average distance between households within every village is more than 1 kilometer, and is close to 2 kilometers in most villages. This corresponds well with the increasing domain framework above.
Convergence of Equilibrium Belief: We now characterize the game’s equilibrium under the asymptotic scheme outlined above. The formal details of the analysis are laid out in Appendix A.2; here we outline the main substantive features and their implications for the belief structure.
To characterize beliefs in equilibrium, write
[TABLE]
given each . may depend on index in a deterministic way. Note that this expression (10) follows from the specification of in (2), defined as the average of the conditional expectations. Then, in the equilibrium, for each village , beliefs are given by the set of functions, , , that solves the following system of equations:
[TABLE]
for (almost surely).
Note that the solution to (13) depends on , the number of households. We now discuss the limit of the solutions when . To this end, for expositional ease, consider a symmetric equilibrium such that for any ; symmetry is imposed here solely for easy exposition, and *a formal proof without symmetry is provided in Appendix *A.2. Under symmetry, the functional equation in (13) is reduced to
[TABLE]
where is a functional operator (mapping) from a -valued function (of random variables, ) to another -valued function (evaluated at ):
[TABLE]
where as formulated in C3-SD. Under C3-IID in (7), we have considered the system of equations that can be eventually defined through the unconditional expectations . In contrast, here we have to consider conditional expectations of the form , as in (13) and (17). Given the correlation in , they do not reduce to the unconditional ones since is useful for predicting others’ . However, under the increasing domain asymptotics and a weak dependence condition (i.e., and are less correlated when is large), both of which are standard asymptotic assumptions for inference with spatial data, the number of players in the game whose unobservables are almost uncorrelated with any given player becomes large as , and further the ratio of such players (among all players) tends to . As a result, the operator converges to the average of the unconditional expectations:
[TABLE]
for any , where we call each summand an ‘unconditional’ expectation in that it is independent of , and we also suppress the dependence of on for notational simplicity.666We write and for any random objects and . The precise meaning of this convergence, together with required conditions, is formally stated in the Appendix (see (93) in the proof of Theorem 5, for the general case without symmetry).
The convergence of the operator to caries over to that of a fixed point of (i.e. the solution of ) when the limit operator is a contraction. The above discussion can be summarized as:
Theorem 1
Suppose that C2 and C3-SD hold with Assumption 4 (introduced in Appendix A.2), and the functional map defined in (20) is a contraction with respect to the metric induced by the norm ( is a -valued function on the support of ),777Note that is independent of , given C2 and C3-SD; and it can be used as a norm. i.e.,
[TABLE]
Let be a solution to the functional equation (which is unique under the contraction property). Then, for each , it holds that for any solution to , which may not be unique,
[TABLE]
Note that the limit of , a fixed point of , corresponds to the equilibrium (constant and symmetric) beliefs for the C3-IID case (a fixed point of in (8); recall that by Propositions 1 - 2).
This theorem is restated as Theorem 5 in Appendix A.2, where its proof is also provided. Theorem 5 derives the convergence of the equilibrium beliefs (without the symmetry assumption ), viz. that the limit of the solution to (13) is given precisely by the solution of (7). The theorem also derives the rate of the convergence in 21: The rate is faster if (1) the area of each village expands quicker as under the increasing-domain assumption; and if (2) the degree of spatial dependence of is weaker. Note that the contraction condition of the limit (unconditional) operator implies existence and uniqueness of the solution, but we do not need to impose it on the operator defined via the conditional operator; multiplicity of solutions () is allowed for, and any of the solutions would then converge to , where the existence of a solution can be relatively easily checked using other, less restrictive fixed point theorem.
In sum, this convergence result justifies the use of Brock and Durlauf (2001a) type specification of constant and symmetric beliefs, even when unobserved heterogeneity exhibits spatial dependence. This enables us to overcome complications in identification and inference posed by the dependence of beliefs on unobservables. In the next section, we present two estimators – one based on the Brock and Durlauf type specification and another that takes into account the conditional expectation feature of the beliefs as in (10). Then, we (a) show that the difference between the two estimators is asymptotically negligible, and (b) justify using observable group average outcome as a regressor in an econometric specification of individual level binary choice as in Brock and Durlauf’s estimation procedure.
**Further Discussions and Comparison with Menzel (2016): In our discussion of the spatial case, the sequence , defined through two independent components, is called subordinated to the stochastic process via the index variables . Subordination has been used previously in econometrics and statistics for modelling spatially dependent processes, c.f. Andrews (2005, Section 7) and Lahiri and Zhu (2006). One implication of subordination is the so-called exchangeability property (see, e.g., Andrews, 2005), and if a sequence of random variables is exchangeable, it can be I.I.D. conditionally on some sigma algebra (often denoted by , the tail sigma algebra), which is known as de Finetti’s theorem (see, e.g., Ch. 7 of Hall and Heyde, 1980). In our setting, this corresponds to the conditional I.I.D.-ness of , given a realization of the stochastic process (as well as that of ), where is set as the sigma algebra generated by the random function .
Menzel (2016) has proposed a conditional inference method for games with many players under the exchangeability assumption. Indeed, Menzel (2016) and the present paper are similar in that both consider estimation of a game with the I.I.D. condition relaxed and under many-player asymptotics. However, there are some substantive differences between Menzel’s (2016) framework and ours. Firstly, in his conditional inference scheme, the probability law recognized by players in a game is different from that used by researchers for inference purposes (i.e., the former is the unconditional law and the latter is the conditional law given ), but they are identical in our setting. This feature of non-identical laws causes difficulty in constructing a valid, interpretable moment restriction that guarantees consistent estimation. In the context of estimating structural economic models (including game theoretic models), such a restriction is usually presented as some exogeneity or exclusion condition that is derived by taking into account players’ optimization behavior, i.e., the restriction is constructed based on the players’ perspective. This sort of construction may not give a valid moment restriction under the conditional inference scheme where validity has to be judged from the researcher’s perspective with the conditional law. To see this point, consider a simple binary choice example: , where and is a covariate. In the standard case, the parameter can be estimated through , where is a weighting function, and is the distribution function of . In contrast, under an inference scheme that exploits exchangeability or conditional I.I.D.-ness of , consistent estimation would require , where is the tail sigma algebra of . The -conditional moment is in general hard to interpret, is not implied by the unconditional one, and it is not always be obvious whether it holds. Indeed, Andrews (2005) discuses failure of consistency in a simple least square regression case when the conditional law is used.
Another feature of Menzel (2016) that is distinct from ours is his focus on aggregate* games*. In his setting, players’ utilities depend on the ‘aggregate state’, that is computed through the conditional expectation of others’ actions ( defined in Eq. (2.1) on p. 311, Menzel, 2016). This object is the counterpart of in our setting in that players’ interactions take place only through the aggregate state ( in our notation). Our for the spatially dependent case is defined in (10) and (13) through conditional expectations () given all information available to player , i.e., both the individual variables and common variable . On the other hand, a counterpart of Menzel’s aggregate state in our context is
[TABLE]
where the conditional expectation is computed given only the common (called a public signal on p. 310 in Menzel, 2016, denoted by ). The formulation (22) means that each player does not utilize all the available information for predicting others’ behavior even when is useful for to predict (and thus ) due to correlation between and . This contradicts the intuitively natural structure of belief formation in Bayesian games via rational expectations in our setting. Note, however, that Menzel (2016, Section 3) also discusses convergence of finite-players games and the associated equilibria. His convergence result is based on the assumption that players’ predictions about other players is based on both in finite games and its limit, while our result establishes convergence of the belief process, where is used in a finite-player game but reduces to in the limit. In this sense, our belief convergence result may be interpreted as providing an asymptotic justification of Menzel’s (2016) ‘aggregate game’ framework.
3 Econometric Specification and Estimators
In this section, we lay out the econometric specification of our model, and describe estimation of preference parameters (denoted by ), assuming that the observed sample is generated via the game introduced in the previous section and satisfying assumptions C1, C2, and C3-SD (the** C3-IID case is simpler, and is nested within the C3-SD case; see more on this below). **In particular, we define the true parameter via a conditional moment restriction that is derived from specification of utility functions and the structure of the game in each of villages. As discussed above, the beliefs in the finite-player game possess a conditional expectation feature, so the conditional expectation used to define has a complicated form, and consequently the estimator based on it, denoted by below, is difficult to implement.
Therefore, we construct another, computationally simpler estimator based on a conditional expectation restriction derived from the limit model with the limit belief (derived in Theorem 1), and use it in our empirical application. We call Brock-Durlauf type as it resembles the estimator used in Brock and Durlauf (2001a, 2007). Since the limit model is not the actual data generating process (DGP), our preferred estimator is based on a mis-specified conditional moment restriction. However, we show that the estimator for the finite-player game with spatial dependence, , which takes into account the conditional-expectation feature of the beliefs (as in (10)) shares the same limit as that is based on the limit model, as , under the asymptotic scheme for spatial data as introduced in the previous section and in Appendix A.2.1. In this sense, the two estimators, and , are asymptotically equivalent, and this result justifies the use of the simpler, Brock-Durlauf type estimation procedure. This result is formally proved in Theorem 2 below. The key challenge in this proof is showing uniform convergence of the fixed point solutions (beliefs) over the parameter space.
Forms of Beliefs under Spatial Dependence: To develop our estimators, we assume that the players’ beliefs in (10) are symmetric: , i.e., the functional form of is common for all the players in the same village .888This can be justified under C1, C2, and C3-SD when the mapping from a -valued function to another -valued function:
E\left[\left.1\left\{\begin{array}[c]{c}U_{1}(Y_{vk}-P_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\\ \geq U_{0}(Y_{vk},g(W_{vk},L_{vk},\boldsymbol{u}_{v}(L_{vk}),\boldsymbol{\xi}_{v}),\boldsymbol{\eta}_{vk})\end{array}\right\}\right|\mathcal{I}_{vh}\right]
(23)
is a contraction, where . This contraction condition for the functional mapping is analogous to that for the function (defined in (8)) in Proposition 2. The proof of symmetric equilibrium beliefs is similarly analogous to the proof of Proposition 2, and is omitted for brevity. We provide and discuss a sufficient condition for (23) to be a contraction in Appendix A.3. We note that given the (conditional) independence assumptions in C2 and C3-SD, the forms of the beliefs can be slightly simplified. That is, the beliefs are a fixed point of the conditional expectation operator (17) with being conditioning variables; however, we can show that ’s variable is irrelevant in predicting other ’s variables in that
[TABLE]
and accordingly, the fixed point solution is a function of without .999We can prove (24) as follows: The sequence is conditionally I.I.D. given (by C2) and thus it is also conditionally independent of the stochastic process given (by C3-SD (ii)). Therefore, is conditionally i.i.d. given , implying that
Since it also holds that , we apply the conditional independence relation (75) with , , and , to obtain
* *
where the derivations of the second and fourth lines have used the following conditional independence relation: for random objects , , , and , if , then ; for the second line, we set , , and , with ; and for the fourth line, , , and with . Thus, with slight abuse of notation, we write
[TABLE]
Linear Index Structure: We now specify the forms of the utility functions. With few large peer-groups (e.g. there are eleven large villages in our application dataset), one cannot consistently estimate the impact of the belief on the choice probability function nonparametrically holding other regressors constant.101010This is because is constant within a village in the (conditionally) I.I.D. case, and this constancy also holds for the limit model in the spatial case. In particular, the fixed point constraint does not help because of dimensionality problems. Indeed, the fixed point condition: , where , the joint CDF of is identified, the unknown function has higher dimension than the observable . Accordingly, following Manski (1993), and Brock and Durlauf (2001a, 2007), we assume a linear index structure with viz. that utilities are given by
[TABLE]
where corresponding to Assumptions 1 - 2, we assume that , , i.e., non-satiation in numeraire, need not equal , i.e. income effects can be present, and that , i.e., compliance yields higher utility. These utilities can be viewed as expected utilities corresponding to Bayes-Nash equilibrium play in a game of incomplete information with many players, as outlined in Section 2 above. Below in Section 4, we will provide more details on interpretation of the individual coefficients in (26) when discussing welfare calculations. These details do not play any role in the rest of this section.
Using (26) and the structure of (see (3)) with and , it follows that
[TABLE]
where we have defined .
Recall that the probabilistic conditions in C2 and C3-SD are stated conditional on the (realized values of) village-fixed unobserved heterogeneity , as in the econometric literature on fixed-effects panel data models. In this sense, we can treat as non-stochastic. Indeed, given many observations per villages, the (realized) values of can be estimated and are included in a set of parameters to be estimated. We discuss this point further in Section 4.4 below.
Econometric Specifications: We now present the alternative estimators. To do this, we need some more notation. Let denotes a (preference) parameter vector, where is the coefficient vector corresponding to .** In the rest of this Section 3, we assume that the village-fixed parameters are known, which is for notational simplicity; this assumption does not change any substantive arguments on the convergence of the estimators. We discuss identification/estimation schemes of these parameters below and provide a complete proof for the case when are estimated using one of the identification schemes (e.g. the homogeneity assumption) in Appendix A.4. **Given (25) and (27), we can write
[TABLE]
In order to incorporate the fixed-point feature of in estimation, where we write for notational simplicity**,** we can assume a parametric model of spatial dependence for the stochastic process , which is required to compute the functional equations defining . Corresponding to the definition of with , we let , where is a stochastic process defined as . We let be the conditional distribution of given , parametrized by a finite dimensional parameter , and the (pseudo) true value is denoted by . We also write the marginal CDF of by and its probability density . In the sequel, we also write the marginal CDF of as , and thus . The joint distribution function of is , given the location indices and .111111This specification implies pairwise stationarity of , i.e. the joint distribution of and depends only on the distance . Stationarity is not strictly necessary for our purpose but is maintained for simplicity. We could also specify the full joint distribution of the whole (for any , or for any with being any finite integer; say, a Gaussian process), which would not affect our estimation method.
To develop estimators that incorporate the fixed point restriction, define the following functional operator based on :
[TABLE]
for , where is a functional operator from a -valued function to another function , and is the joint CDF of . We provide sufficient conditions for this to be a contraction in Appendix A.3.
Given the above set-up, define the model to be estimated as:
[TABLE]
where and denote the true parameters and is a solution to the functional equation defined through the operator (29) (for each given):
[TABLE]
and C1, C2, **C3-SD, **and some regularity conditions (provided below) are satisfied. Henceforth, the model (30) will be assumed to be the DGP of observable variables ().
3.1 Econometric Estimators
Definition of the Estimand: Suppose for now that the true parameter for the spatial dependence is given. Then, based on (28), we define the true preference parameter (i.e., our estimand) as the solution to the conditional moment restriction:
[TABLE]
where is the conditional choice probability function121212Note that all the (conditional) expectations, and in this Section 3 are taken with respect to the law of , , , and , or conditional on the unobserved heterogeneities (or ).:
[TABLE]
Practical Estimator Based on the Limit Model: Given our parametric set-up, we can in principle compute an empirical analogue of (33) by solving an empirical version of the fixed point equation (31). This estimator, denoted below by , is difficult to compute in practice. Therefore, we consider an alternative estimator based on the simpler conditional moment condition:
[TABLE]
This is derived from the *limit model *with the limit beliefs , which do not depend on the unobserved heterogeneity and other specific variables. Indeed, the limit model is not the true DGP, and thus this (34) is mis-specified under **C3-SD **(it is correctly specified under C3-IID). Nonetheless, we show that the estimator based on (34), which we eventually use in our empirical application, can be justified in an asymptotic sense. This simpler estimator is given by:
[TABLE]
where
[TABLE]
where , is the parameter space that is compact in with being the dimension of , , and the constant beliefs, , (that appear in the limit model) are estimated by . We use the label ‘BR’ for this estimator, as it is based on the Brock and Durlauf (2001a) type formulation. This estimator is easy to compute as its objective function requires neither solving fixed point problems nor any numerical integration, in which the belief formulation is based on the limit model with constant beliefs . Below, we show that the complicated estimator (based on (30)) and the simpler one have the same limit.
**Potential Estimator for the Finite-Player Game: **We now formally introduce the computationally difficult potential estimator based on (32). It is defined through the following objective function:
[TABLE]
where is an estimate of the conditional choice probability that explicitly incorporate conditional-belief and fixed-point features:
[TABLE]
and is an estimator of the belief and is defined as a solution to the following functional equation for each :
[TABLE]
is an empirical version of (defined in (29)) in which the true is replaced by :
[TABLE]
This is an empirical version of a solution to (29). A notable feature of this is that it is a function of the unobserved heterogeneity (represented by the variable ). Due to this dependence on , computation of in (36) and in (38) is difficult, and requires numerical integration of the indicator functions; furthermore, finding the fixed point in the functional equation (37) will also require some numerical procedure.
Here, we do not pursue how to identify and estimate the parameter for the spatial dependence (since our empirical application is not anyway based on ), but suppose the availability of some reasonable preliminary estimator with , and define our estimator as
[TABLE]
Note that given this form of , we can again interpret this estimator as a moment estimator that solves
[TABLE]
with some appropriate choice of the weight . This may be viewed as a sample moment condition based on the population one in (32). The corresponding estimation procedure would be similar to the nested fixed-point algorithm, as in Rust (1987).
3.2 Convergence of the Estimators
We now show that , i.e., based on the correct condition moment restriction (32) and based on the mis-specified one (34) are asymptotically equivalent. That is, if is consistent, so is and vice versa; in the proof, we show that both the estimators are consistent for that satisfies (111). This is formally stated in the following theorem:
Theorem 2
Suppose that C1, C2, C3-SD, Assumptions 4, 5, 6, 7, and 8 hold. Then
[TABLE]
The formal proof is provided in Appendix A.4; the outline is as follows. We start by introducing another, intermediate estimator that is based on constant beliefs but solves the Fixed Point problem of the Limit* model*, , where
[TABLE]
where is a solution to the fixed point equation for each (fixed):
[TABLE]
Note that is a sample version of that solves
[TABLE]
which is the population version of (39) with replaced by the true CDF of . This is constructed based on the limit model (with constant beliefs), but it explicitly solves the fixed point restriction (39) (unlike derived from the Brock-Durlauf type moment restriction (34)). may be interpreted as a moment estimator that is derived from the conditional moment restriction131313Note that can also be defined as solving , where, given an appropriate choice of the weight ,
:
[TABLE]
Note that this restriction is also a mis-specified one.
We show the convergence of in two steps. In the first step, we show that and have the same limit, which is the solution to a different conditional moment restriction (See (111) in Appendix A.4). In the second step, we show that is asymptotically well approximated by uniformly over for any sequence of (as ).
4 Welfare Analysis
We now move on to the second part of the paper, which concerns welfare analysis of policy interventions under spillovers. Since we assume spillovers are restricted to the village where households reside, any welfare effect of a policy intervention can be analyzed village by village. So for economy of notation, we drop the subscripts except when we account explicitly for village-fixed effects during estimation. Also, we use the same notation to denote both individual beliefs entering individual utilities, and the unique, equilibrium belief about village take-up rate entering the average demand function. The assumption of a constant (within village) is justified via the results Proposition 1, Proposition 2 and Theorem 1.
In the welfare results derived below, all probabilities and expectations – e.g. mean welfare loss – in Sections 4.1-4.3 are calculated with respect to the marginal distribution of aggregate unobservables, denoted by above and below. In this sense, they are analogous to ‘average structural functions’ (ASF), introduced by Blundell and Powell (2004). Later, when discussing estimation of the ASF, together with the implied pre- and post-intervention aggregate choice probabilities and average welfare in Section 4.4, we will allude to village-fixed effects explicitly, and show how they are estimated and incorporated in demand and welfare predictions**.**
In order to conduct welfare analysis, we impose two restrictions on the utilities.
Assumption 1
* and (introduced in (1) in Section 2) are continuous and strictly increasing for each fixed value of and , i.e., all else equal, utilities are non-satiated in the numeraire.*
Assumption 2
For each and , is continuous and strictly increasing, and is continuous and weakly decreasing, i.e. conforming yields higher utility than not conforming for each individual.
Define to be the structural probability (i.e. Average Structural Function or ASF) of a household choosing when it faces a price of , and has income and belief :
[TABLE]
and let , where is the CDF of
Policy Intervention: Start with a situation where the price of alternative is and the value of is . Then suppose a price subsidy is introduced such that that individuals with income less than an income threshold become eligible to buy the product at price . This policy will alter the equilibrium adoption rate; suppose the new equilibrium adoption rate changes to . How the counterfactual and are calculated will be described below. For given values of and , we now derive expressions for welfare resulting from the intervention. By “welfare” we mean the compensating variation (CV), viz. what hypothetical income compensation would restore the post-change indirect utility for an individual to its pre-change level. For a subsidy-eligible individual, for any potential value of corresponding to the new equilibrium, the individual compensating variation is the solution to the equation
[TABLE]
whereas for a subsidy-ineligible individual, it is the solution to
[TABLE]
Note that we do not take into account peer-effects again in defining the CV because the income compensation underlying the definition of CV is hypothetical. So the impact of actual income compensation on neighboring households is irrelevant. Since the CV depends on the unobservable , the same price change will produce a distribution of welfare effects across individuals; we are interested in calculating that distribution and its functionals such as mean welfare.
Existence of : Under the following condition, there exists an that solves (42) and (43):
Condition
For any fixed and , it holds that (i) , and (ii) .
Intuitively, this condition strengthens Assumption 1 by requiring that utilities can be increased and decreased sufficiently by varying the quantity of numeraire. Existence follows via the intermediate value theorem. Under an index structure, existence is explicitly shown below. Finally, uniqueness of the solution to (42) and (43) follows by strict monotonicity in numeraire. Since the maximum of two strictly increasing functions is strictly increasing, the LHS of (42) and (43) are strictly increasing in , implying a unique solution.
Welfare with Index Structure: In accordance with the literature on social interactions (see Section 3 above), from now on we maintain the single-index structure introduced in (26):
[TABLE]
with , , and .141414We can also allow for concave income effects by specifying, say,
but we wish to keep the utility formulation as simple as possible to highlight the complications in welfare calculations even in the simplest linear utility specification. In our empirical setting of anti-malarial bednet adoption, there are multiple potential sources of interactions (i.e. ). The first is a pure preference for conforming; the second is increased awareness of the benefits of a bednet when more villagers use it; the third is a perceived negative health externality. The medical literature suggests that the technological health externality is positive, i.e. as more people are protected, the lower is the malaria burden, but the perceived health externality is likely to be negative if households correctly believe that other households’ bednet use deflects mosquitoes to unprotected households, but ignore the fact that those deflected mosquitoes are less likely to carry the parasite. Indeed, the implications for adoption are different: under the positive health externality, one would expect free-riding, hence a negative effect of others’ adoption on own adoption; under the negative health externality, the correlation would be positive.
In particular, let denote the conforming plus learning effect, and denote the health externality. Then it is reasonable to assume that and . In other words, the compliance motive and learning effect together are equal in magnitude but opposite in sign between buying and not buying. Further, if a household uses an ITN, then there is no health externality from the neighborhood adoption rate (since the household is protected anyway), but if it does not adopt, then there is a net health externality effect from neighborhood use, which makes the overall effect and in general.151515An analogous asymmetry is also likely in the school voucher example mentioned in the introduction if the voucher-led ‘brain-drain’ leads to utility gains and losses of different amounts, e.g. if better teaching resources in the high-achieving school substitute for – or complement – peer-effects in a way that is not possible in the resource-poor local school. In the context of ITNs, the technological effects are unlikely to be large enough and/or the villagers are unlikely to be sophisticated enough to understand the potential deterrent effects of ITNs. Therefore, we assume from now on that the perceived health externality is non-positive, and thus .
Given the linear index specification, the structural choice probability for alternative at is given by
[TABLE]
where denotes the marginal distribution function of . It is known from Brock and Durlauf (2007) that the structural choice probabilities identify and , i.e. , , and , up to scale even without knowledge of the probability distribution of . In the application, we will consider various ways to estimate the structural choice probabilities, including standard Logit and Klein and Spady’s distribution-free MLE. One can also use other semiparametric methods, e.g. Bhattacharya (2008) or Han (1987) that require neither specification of error distributions nor subjective bandwidth choice.
The condition makes the model different from standard demand models for binary. In the standard case, for the so-called “outside option”, i.e. not buying, the utility is normalized to zero. In a social spillover setting, this cannot be done because that utility depends on the aggregate purchase rate . As we will see below, in welfare evaluations of a subsidy, and appear separately in the expressions for welfare-distributions, but cannot be separately identified from demand data, which can only identify . As a result, point-identification of welfare will in general not be possible. Below, we will consider three untestable special cases, under which one obtain point-identification, viz. (i) (i.e. : no health externality and symmetric spillover), (ii) (i.e. : technological health externality dominates deflection channel and net health externality exactly offsets conforming effect) and (iii) , ( and : no conforming effect and deflection channel dominates). Cases (ii) and (iii) will yield respectively the upper and lower bounds on welfare gain in the general case.
Toward obtaining the welfare results, consider a hypothetical price intervention moving from a situation where everyone faces a price of to one where people with income less than an eligibility-threshold are given the option to buy at the subsidized price . This policy will alter the equilibrium take-up rate. Assume that the equilibrium take up rate changes from to . We will describe calculation of and later. For given values of and , the welfare effect of the policy change can be calculated as described below. We first lay out the results in detail for the case where , which corresponds to our application. In the appendix we present results for a hypothetical case where (which may happen if there are multiple equilibria before and after the intervention). For the rest of this section, we assume that .
4.1 Welfare for Eligibles
The compensating variation for a subsidy-eligible household is given by the solution to
[TABLE]
Since LHS is strictly increasing in , the condition is equivalent to
[TABLE]
If , then each term on the LHS of (46) is smaller than the corresponding term on the RHS. If , then each term on the LHS is larger than the corresponding term on the RHS. This gives us the support of :
[TABLE]
Remark 1
Note that the above reasoning also helps establish existence of a solution to (45). We know from above that for , the LHS of (45) is strictly smaller than the RHS, and for , the LHS of (45) is strictly larger than the RHS. By continuity, and the intermediate value theorem, it follows that there must be at least one where (45) holds with equality.
Back to calculating the CDF, now consider the intermediate case where
[TABLE]
In this case, the first term on LHS of (46) is larger than first term on RHS for all , and the second term on LHS of (46) is smaller than the second term on the RHS for all , and thus (69) is equivalent to
[TABLE]
For any given , we have that the probability of (47) reduces to
[TABLE]
The intercept , the slopes and are all identified from conditional choice probabilities; but is not identified, and therefore (48) is not point-identified from the structural choice probabilities. However, since , for each feasible value of , we can compute a feasible value of (48), giving us bounds on the welfare distribution.
Note also that the thresholds of at which the CDF expression changes are also not point-identified for the same reason. However, since and , , the interval
[TABLE]
will translate to the left as varies from [math] to .
Putting all of this together, we get the following result:
Theorem 3
If Assumptions 1, 2, and the linear index structure hold and , then given , the distribution of the compensating variation for eligibles is given by
[TABLE]
Remark 2
Note that the above theorem continues to hold even if the subsidy is universal; we have not used the means-tested nature of the subsidy to derive the result.
Mean welfare: From (52), mean welfare loss is given by
[TABLE]
Discussion: The width of the bounds on (52) and (53), obtained by varying over , depends on the extent to which is affected by , i.e. the extent of social spillover, and also the difference in the realized values and . For our single-index model, the fixed point restrictions imply that these counterfactual and depend on and only via (c.f. (68) and (69) below) which is point-identified, so every potential value of counterfactual demand is point-identified. But given any feasible value of and , the welfare (53) is not point-identified in general since is unknown.
Given , the welfare gain in expression (53) is increasing in ; i.e., the welfare gain is largest in absolute value when and , and the smallest when and . Conversely for welfare loss. Intuitively, if there is no negative externality from increased on non-purchasers, then they do not suffer any welfare loss, but purchasers have a welfare gain from both lower price and higher . Conversely, if all the spillover is negative, then purchasers still get a welfare gain via price reduction, but non-purchasers suffer welfare loss due to increased . Also, note that under quasilinear utilities, where income effects are absent, the drops out of the above expressions, but the same identification problem remains, since does not disappear. Changing variables , one may rewrite (53) as
[TABLE]
Note that if , then the first term is the usual consumer surplus capturing the effect of price reduction on consumer welfare; for a positive , the term yields the additional effect arising via the conforming channel. Also, if , then the second term, i.e. the welfare loss from not buying, is the largest (given ): this corresponds to the case where all of is due to the negative externality.
The second term in (54), which represents welfare change caused solely via spillover and no price change, is still expressed as an integral with respect to price. This is a consequence of the index structure which enables us to express this welfare loss in terms of foregone utility from an equivalent price change. To see this, recall eq. (45)
[TABLE]
which is equivalent to
[TABLE]
which is of the form
[TABLE]
i.e.
[TABLE]
From Bhattacharya, 2015, this is exactly the form for the compensating variation in a binary choice model without spillover when income is and price changes from to .161616Analogously, the choice probabilities have the form
i.e. the choice probabilities under spillover at price income and aggregate use can be expressed as choice-probabilities in a binary choice model with no spillover at an adjusted price and the same income.
Corollary 1
In the special case of symmetric interactions, i.e. where in (26) (e.g. if , i.e. there is no health externality in the health-good example), we get that , and from (54) mean welfare equals:
[TABLE]
If , and , i.e. all spillover is via conforming, average welfare is given by
[TABLE]
if on the other hand, all spillover is due to perceived health risk, i.e. and , then average welfare is given by
[TABLE]
Equations (56) and (57) correspond to the upper and lower bounds, respectively, of the overall welfare gain for eligibles.171717In independent work, Gautam (2018) obtained apparently point-identified estimates of welfare in parametric discrete choice models with social interactions, using Dagsvik and Karlstrom (2005)’s expressions for the setting without spillover. Even with strong restrictions, under which welfare is point-identified, our welfare expressions (c.f. eqn (55), (56), (57)) are different from Gautam’s.
4.2 Welfare for Ineligibles
Welfare for ineligibles is defined as the solution to the equation
[TABLE]
Using the index-structure, is therefore equivalent to
[TABLE]
If , then each term on the LHS is smaller than the corresponding term on the RHS for each realization of the s. So the probability is 0. Similarly, for , each term on the LHS is larger, and thus the probability is 1. In the intermediate range, , we have that the first term on the LHS exceeds the first term on the RHS for each , and the second term on the LHS is smaller than the second term on the RHS for each . Therefore, (58) is equivalent to
[TABLE]
The probability of this event is not point-identified if the values of , are not known. But for each choice of , we can compute the probability of this event as
[TABLE]
Putting all of this together, we have the following result:
Theorem 4
If Assumptions 1, 2, and the linear index structure hold and , then for each ,
[TABLE]
For ineligibles, all of the welfare effects come from spillovers, since they experience no price change. In particular, for ineligibles who buy, there is a welfare gain from positive spillover due to a higher . For ineligibles who do not buy, there is, however, a potential welfare loss due to increased . This is why the CV distribution has a support that includes both positive and negative values. From (62), mean compensating variation is given by
[TABLE]
Using the change of variables, , the above expression becomes
[TABLE]
The first term in (64) captures the welfare gain resulting from a positive and higher ; this term would be zero if . The second term in (64) captures the welfare loss also resulting from higher ; this loss would be zero if there are no negative impacts, i.e. . Of course, both would be zero if , reflecting the fact that welfare effect on ineligibles would be zero if there is no spillover.
Corollary 2
In the three special cases where we have point-identification, viz. (i) ; (ii) , ; and (iii) , , mean CV (64) reduces respectively to:
[TABLE]
Equations (66) and (67) correspond to the upper and lower bounds, respectively, of the overall welfare gain for ineligibles, and therefore, the overall bounds generically contain both positive and negative values, since .
4.3 Deadweight Loss
The average deadweight loss (DWL) can be calculated as the expected subsidy spending less the net welfare gain. In particular, if and , i.e. there are no negative spillover, then from (54) and (63), the DWL equals
[TABLE]
So if is large enough, then it is possible for the deadweight loss to be negative, i.e. for the subsidy to increase economic efficiency under positive spillover, as in the standard textbook case. This can happen because there is no subsidy expenditure on ineligibles, and yet those that buy enjoy a subsidy-induced welfare gain due to positive spillover. Similarly, eligibles also receive an additional welfare gain via positive spillover, over and above the welfare-gain due to reduced price, and it is only the latter that is financed by the subsidy expenditure. In general, the deadweight loss will be lower (more negative) when (i) the positive spillover is larger, (ii) the change in equilibrium adoption due to the subsidy is greater, and (iii) the price elasticity of demand is lower – the last effect lowers deadweight loss simply by reducing the substitution effect, even in absence of spillover.
4.4 Calculation of Predicted Demand and
Welfare
In order to calculate our welfare-related quantities, we need to estimate the structural choice probabilities and the equilibrium values of the aggregate choice probabilities, and in the pre and post intervention situations. To do this we will consider two alternative scenarios. The first is where we assume that the unobservables are independent of realized values of price and income (conditional on other covariates) in the available, experimental data. The second is where we assume that exogeneity holds, conditional on unobserved village-fixed effects. Note that price in our data are randomly assigned, so the endogeneity concern is solely regarding income. Under income endogeneity, Bhattacharya (2018) had discussed interpretation of welfare distributions as conditional on income. See Appendix A.6 of the present paper for a review of that discussion. Regardless, calculation of the equilibrium s requires us to either assume exogeneity of observables or to estimate village-fixed effects, conditional on which exogeneity holds, as in our assumptions above.
No Village-Fixed Effects: Under the index-restriction (26) and no village-fixed effects, estimation of can be done via standard binary regression, using the variation in price and income across and within villages and of observed across villages to estimate the coefficients constituting the linear index. This implicitly assumes, as is standard in the literature, that even if the game can potentially have multiple equilibrium ’s, only a single equilibrium is played in each village, and thus one can use the observed from each village as a regressor to infer the preference parameters. Note that given the index structure, we do not need to impose a specific distribution for the s to calculate the index coefficients. Any existing semiparametric estimation method for index models can be used for calculations, e.g. Klein and Spady (1993), which requires bandwidth choice and Bhattacharya (2008), which does not.
Finally, the equilibrium values of and can be calculated in each village by solving the fixed point problems
[TABLE]
where denotes the distribution of income in the village. For fixed , the RHS of the above equations, viewed as functions of and respectively, are each a map from to . If and are continuous, then by Bruower’s fixed point theorem, there is at least one solution in and , respectively, implying ”coherence”. However, there may be multiple solutions, and then our welfare expressions would have to be applied separately for each feasible pair of values . Note that even if the solutions to (68) and (69) are unique, our expressions in theorems 3 and 4 above imply that welfare distributions are still not point-identified.
Once we obtain the predicted values of and , we can calculate (52) and (62) directly, using previously obtained estimates of the index coefficients.
With Village-Fixed Effects: Our data for the application come from eleven different villages with approximately 180 households per village. It is plausible that utilities from using and from not using a bednet are affected by village-specific unobservable characteristics, such as the chance of contracting malaria when not using a bednet. Such effects were termed “contextual” by Manski (1993). Brock and Durlauf (2007) discussed some difficulties with estimating social spillover effects in presence of group-specific unobservables. To capture this situation explicitly, recall the linear utility structure from Section 2, given by
[TABLE]
where and denote unobservable village specific characteristics. Therefore,
[TABLE]
Since is village specific and we have many observations per village, we can use a dummy for each village, and estimate the regression of take-up on price, income and other characteristics that vary across households within village , together with village dummies, i.e.
[TABLE]
where refers to the distribution of (which may potentially depend on the realized value for village ). The consistency of these estimates results from exogeneity conditional on village-fixed effects (See assumptions C3-IID (ii) and C3-SD (ii) above).The identified coefficients of the village dummies therefore satisfy . We will need to identify the sum . However, in the equations there are as many as there are , so we have equations in unknowns (s and ). In our empirical application, we address this issue in two separate ways. The first is a homogeneity assumption for observationally similar villages, and the second is Chamberlain’s correlated random effects approach.
Homogeneity Assumption: If two villages are very similar in terms of observables, then it is reasonable to assume that they have similar values of , which leads to a dimension reduction, and enables point-identification simply by solving the linear system as there are as many s as the number of less (for ). Indeed, in our application, there are two villages out of eleven in our dataset that are very similar in terms of observables, and hence are amenable to this approach.
Correlated Random Effects Assumption: A different way to address the unobserved group-effect issue is to use Chamberlain’s correlated random effects approach (c.f. Section 15.8.2 of Wooldridge, 2010). In this approach, one models the unobserved where denotes the village-averages of observables, and the error term is assumed to satisfy (). The coefficients are estimated in an initial probit regression of purchase on individual and village characteristics
In the absence of the above assumptions, can be point-identified using an instrumental variable type strategy if there are many villages, e.g. estimate the ‘regression’ using, say the aggregate fraction of individuals with subsidies or the average value of subsidy as the IV for . But since we have only eleven villages in our data, we do not consider this avenue.
Welfare Calculation with Village-Fixed Effects: Once we have a plausible way to estimate the structural choice probabilities, we can proceed with welfare calculation in presence of social spillover and unobserved group-effects, as follows. Consider an initial situation where everyone faces the unsubsidized price , so that the predicted take-up rate in village solves
[TABLE]
where is the distribution of income in village , and , , , and are estimated as above. Now consider a policy induced price regime for ineligibles (wealth larger than ) and for eligibles (wealth less than ). Then the resulting usage in village is obtained via solving the fixed point in the equation
[TABLE]
Finally, average welfare effect of this policy change in village can be calculated using
[TABLE]
where and are average welfare at income in village , calculated from (52) for eligibles and (62) for ineligibles, respectively, using and as the predicted take-up probability in village (analogous to and in (52) and (62)), as above.
5 Empirical Context and Data
Our empirical application concerns the provision of anti-malarial bednets. Malaria is a life-threatening parasitic disease transmitted from human to human through mosquitoes. In 2016, an estimated 216 million cases of malaria occurred worldwide, with 90% of the cases in sub-Saharan Africa (WHO, 2017). The main tool for malaria control in sub-Sahran Africa is the use of insecticide treated bednets. Regular use of a bednet reduces overall child mortality by around 18 percent and reduces morbidity for the entire population (Lengeler, 2004). However, at $6 or more a piece, bednets are unaffordable for many households, and to palliate the very low coverage levels observed in the mid-2000s, public subsidy schemes were introduced in numerous countries in the last 10 years. Our empirical exercise is designed to evaluate such subsidy schemes not just in respect of their effectiveness in promoting bednet adoption, but also their impact on individual welfare and deadweight loss, in line with classic economic theory of public finance and taxation. Based on our discussion in Section 4, we focus on two main sources of spillover, viz. (a) a preference for conformity, and (b) a concern that mosquitoes will be deflected to oneself when neighbors protect themselves. Both will generate a positive effect of the aggregate adoption rate on one’s own adoption decision, but they have different implications for the welfare impact of a price subsidy policy.
**Experimental Design: **We exploit data from a 2007 randomized bednet subsidy experiment conducted in eleven villages of Western Kenya, where malaria is transmitted year-round. In each village, a list of to households was compiled from school registers, and households on the list were randomly assigned to a subsidy level. After the random assignment had been performed in office, trained enumerators visited each sampled household to administer a baseline survey. At the end of the interview, the household was given a voucher for an bednet at the randomly assigned subsidy level. The subsidy level varied from % to % in two villages, and from % to % in the remaining villages; there were corresponding final prices faced by households, ranging from [math] to Ksh (US $$5.50$). Vouchers could be redeemed within three months at participating local retailers.
**Data: **We use data on bednet adoption as observed from coupon redemption and verified obtained through a follow-up survey. We also use data on baseline household characteristics measured during the baseline survey. The three main baseline characteristics we consider are wealth (the combined value of all durable and animal assets owned by the household); the number of children under 10 years old; and the education level of the female head of household.181818Not all households in a village participated in the game. However, at the time of the experiment, non-selected households did not have the opportunity to buy an ITN, and the outcome variables for such households are always zero. So even if we allow for interactions among all households (including non-selected ones), it is easy to make the necessary adjustments in the empirics. See Appendix A.7 for more on this.
6 Empirical Specification and
Results
We work with the linear index structure (26), where is taken to be the household wealth, is the experimentally set price faced by the household, is the average adoption in the village. The health externality from bednet use is implicitly accounted for via the dependence of utilities from adoption and non-adoption on the average adoption rate (c.f. eq. (26)).191919There are some households who live in the village but were not part of the formal experiment. Since the ITN was not available from any source other than via the experiment, this only impacts the game via the computed fraction . We clarify this point in Appendix A.7.
For the empirical analysis, we also use additional controls, denoted by below, that can potentially affect preferences ( and ) and therefore the take-up of bednet, i.e. . In particular, we include presence of children under the age of ten and years of education of the oldest female member of the household. A village-specific variable that could affect adoption is the extent of malaria exposure risk in the village. We measure this in our data from the response to the question: ”Did anyone in your household have malaria in the past month?”. Summary statistics for all relevant variables are reported in Table 1, and their village averages are shown in table 2, for each of the eleven villages in the data.
Our first of results correspond to taking to be the standard logit CDF of (as in (44), i.e. with no fixed effects), and including average take-up in village as a regressor.202020While estimating the logit parameters we do not impose the fixed point constraint. While this would have improved efficiency, the additional computational burden would be quite onerous. As shown in Theorem 2 above, even if unobservables are spatially correlated, our increasing domain asymptotic approximation will lead to consistent estimates of preference parameters. This approximation is reasonable in our empirical setting where the average distance between households within a village typically exceeds 1.5 Kilometers. The marginal effects at mean are presented in Table 3. It is evident that demand is highly price elastic, and that average bednet adoption in the village has a significant positive association with private adoption, conditional on price and other household characteristics, i.e. in our notation above. The social interaction coefficient is which is less than , as required for the fixed point map to be a contraction (see discussion following Proposition 2) in the logit case. The effect of children is negative, likely reflecting that households with children had already invested in other anti-malarial steps, e.g. had bought a less effective traditional bednet prior to the experiment. We also computed analogous estimates where we ignore the spillover, i.e., we drop average take-up in village from the list of regressors. The corresponding marginal effects for the retained regressors are not very different in magnitude from those obtained when including the average village take-up, and so we do not report those here. Instead, we use the two sets of coefficients to calculate and contrast the predicted bednet adoption rate corresponding to different eligibility thresholds. These predicted effects are quite different depending on whether or not we allow for spillover, and so we investigated these further, as follows.
In particular, we consider a hypothetical subsidy rule, where those with wealth less than are eligible to get the bednet for KSh (% subsidy), whereas those with wealth larger than get it for the price of KSh (% subsidy). Based on our logit coefficients, we plot the predicted aggregate take-up of bednets corresponding to different income thresholds . In Figure 1, for each threshold , we plot the fraction of households eligible for subsidy on the horizontal axis, and the predicted fraction choosing the bednet on the vertical axis, based on coefficients obtained by including (solid) and excluding (small dash) the spillover effect. The 45 degree line (large dash) showing the fraction eligible for the subsidy is also plotted in the same figure for comparison.
It is evident from Figure 1 that ignoring spillovers leads to over-estimation of adoption at lower thresholds and underestimation at higher thresholds of eligibility. To get some intuition behind this finding, consider a much simpler set-up where an outcome is related to a scalar covariate via the classical linear regression model where is zero-mean, independent of and . OLS estimation of this model yields estimators , with probability limits (and also expected values) and , respectively. Corresponding to a value of , the predicted outcome has a probability limit of . Now consider what happens if one ignores the covariate . Then the prediction is simply the sample mean of which has the probability limit of . Therefore, if . Thus, although the ignored covariate has a positive effect on the outcome (since ), ignoring it in prediction leads to an overestimation of the outcome if the point where the prediction is made is smaller than the population average of the ignored covariate. On the other hand, if , then there will be under-estimation.
Having obtained these (uncompensated) effects, we now turn to calculating the average demand and the mean compensating variation for a hypothetical subsidy scheme. We consider an initial situation where everyone faces a price of KSh for the bednet, and a final situation where an bednet is offered for KSh to households with wealth less than KSh (about the th percentile of the wealth distribution), and for the price of KSh to those with wealth above that. The demand results are reported in Table 4, and the welfare results in Table 5. We perform these calculations village-by-village, and then aggregate across villages. To calculate these numbers, we first predict the bednet adoption when everyone is facing a price of KSh, and then when eligibles face a price of KSh and the rest stay at KSh, giving us the equilibrium values of and , respectively, in our notation above. In all such calculations with our data, we always detected a single solution to the fixed point (i.e. a unique equilibrium) as can be seen from Figure 2, where we plot the squared difference between the RHS and the LHS of eqn. (69), i.e.
[TABLE]
on the vertical axis, and on the horizontal axis, separately for each of the eleven villages, where is the predicted demand (choice probability) function at . The globally convex nature of each objective function is evident from Figure 1. The minima are relatively close to each other around , except village 7 and 10, where it is larger. A similar set of globally convex graphs is obtained for , which minimizes . These predicted values of and are used as inputs into the prediction of demand as per eqn. (41) and welfare as per Theorems 3 and 4.
The first row of Table 4 shows the pre-subsidy predicted demand (using a logit CDF ) by subsidy-eligibility. In the second row, we calculate the predicted effect of the subsidy on demand, and break that up by the own price effect (Row 2) and the spillover effect (row 3). The own effect is obtained by changing the price in accordance with the subsidy but keeping the average village demand equal to the pre-subsidy value; the spillover effect is the difference between the overall effect and the own effect. It is clear that spillover effects on both eligibles and ineligibles are large in magnitude. In particular, the spillover effect raises demand for ineligibles by nearly 33% of its pre-subsidy level.
In Table 5, we report welfare calculations. First, in the row titled ”Logit”, we report the average CV of the subsidy rule for eligibles, corresponding to assuming no spillover. In this case, we simply use the results of Bhattacharya (2015) to calculate the (point-identified) average CV for eligibles as the price changes from KSh to KSh. This yields the value of welfare gain to be KSh. As there is no spillover, the welfare change of ineligibles is zero by definition, and therefore the net welfare gain, denoted by net CV is simply the fraction eligible () times the average CV for eligibles. This is reported in the second column of Table 5.
We next turn to the case with spillover. Using the predicted adoption rates and , we compute the lower and upper bounds of the overall average CV using (54), (56) and (57) for eligibles, and using (64), (66) and (67) for ineligibles. These are reported in Columns 3-6 of Table 5. The most conspicuous finding from these numbers is that ineligibles can suffer a large welfare loss on average due to the subsidy. This is because the subsidy facilitates usage for solely the eligibles, raising the equilibrium usage in the village, but the ineligibles keep facing the high price, and thus a lower utility from not buying because is now higher (in the index specification, ). However, the few ineligibles who buy, despite the high price, get some welfare increase from a rise in the average adoption rate, that explains the small upper bound corresponding to the case . As for eligibles, the lower and upper bounds on average welfare gain do not contain the estimate that ignores spillovers, suggesting over-estimation of welfare gains in the latter case. This is also consistent with Figure 1, where we see that at % eligibility and lower, demand is overestimated when spillovers are ignored. The overall welfare gain across eligibles and ineligibles, reported in the column with heading “net CV”, includes the negative welfare effects on ineligibles, thereby lowering the average effect relative to ignoring spillovers and incorrectly concluding no welfare change for ineligibles.
Deadweight Loss: To compute the average deadweight loss, we subtract the net welfare from the predicted subsidy expenditure. The latter equals the amount of subsidy ( KSh) times the average demand at the subsidized price 50 KSh of the eligibles. Thus the expression for DWL is given by
[TABLE]
where denotes wealth, denotes other covariates, denotes predicted demand at price KSh including the effect of spillover, and and refer to average welfare gain for eligibles and ineligibles, respectively. Ignoring spillovers leads to the point-identified deadweight loss
[TABLE]
Group-Effects: It is evident from table 2 that villages 1 and 11 are highly similar in terms of the average values of key regressors, except that the (randomly assigned) average price in village 1 is much higher than in village 11, which explains the much lower average adoption in village 11. Given this, we assume that villages 1 and 11 are likely to be similar in terms of their unobservables, and as such, we estimate a single for them. Specifically, we first estimate
[TABLE]
where is a vector containing presence of children and female education, the s are village-specific intercepts (estimated using dummies for the villages), and and are price faced by the household in the experiment and its wealth, respectively. In the second step, we solve the linear system , for and , for , where is obtained in the previous step, and the s are the average adoption rates in individual villages in the experiment. In solving this system, we set , which incorporates the homogeneity assumption discussed above. We can do all of this in one step by adding nine dummies for villages 2-10 and one for villages 1 and 11, and then running a regression of individual use on the regressors and , the average use in each village, as well as the village dummies. In the second row in Table 5, we report the average welfare effects of the same hypothetical policy change as described above, using expression (72).
Next, we use the correlated random effect approach described above, where village averages of observable regressors (price, wealth, female education, number of children) are added as additional controls in a probit (instead of logit) regression. The corresponding welfare results are reported in the third row of table 5.
Semiparametric Estimates: Finally, in the fourth row of Table 5, we report welfare results from a semiparametric index estimation of the conditional choice-probabilities, i.e. retaining the index structure but dropping the logit assumption. This is achieved by using the “sml” routine (de Luca, 2008) in Stata which implements Klein and Spady’s (1993) estimator for single index models, using (i) a default bandwidth of to estimate the index, and then (ii) a local cubic polynomial for regressing the binary outcome on the estimated index to produce the predicted probabilities, using a bandwidth of where is chosen via leave-one-out cross-validation.
The welfare numbers do vary a bit across specifications. But all of these results support the overall conclusion that accounting for spillovers can lead to much lower estimates of net welfare gain from the subsidy program and higher deadweight loss. Some of this difference arises from potential welfare loss suffered by ineligibles that is missed upon assuming no spillover, and some from the impact of including spillover terms on the prediction of counterfactual purchase-rates (c.f. Fig 1).
In Table 6, we report standard errors for the simple logit case. In principle, one can also derive formulae for standard errors adjusted for spatial correlation, but given that the paper is already quite long, and such standard errors contribute nothing substantive, we do not attempt that here. Table 6 also reports the welfare calculations corresponding to the special case where . This would be reasonable when there is no negative externality due to deflection, i.e. above, whence average welfare becomes point-identified. Note that this case is different from the results obtained assuming no spillover whatsoever, i.e. the first row third column of table 5. We still obtain a negative average effect of the subsidy due to the larger aggregate welfare loss of ineligibles compared with the gains of eligibles.
Comparative Statics: In Table 7, we show how the welfare effects change as we vary the generosity of the subsidy scheme; the wealth threshold for qualification is varied so that either 20%, 40% or 60% of the population is eligible. It is apparent from Table 7 that the upper bound on welfare loss for ineligibles increases as more people become eligible (since equilibrium take up is higher), and the deadweight loss larger still due to both a larger extent of subsidy induced distortion, as well as the higher welfare loss of ineligibles. The lower bound on the welfare gain for eligibles decreases as the share eligible increases, in fact it becomes negative when 40% are eligible. This is because those among the eligible who are too poor to buy the bednet even at the 50Ksh price are now experiencing a welfare loss since equilibrium take-up is higher. The overall effect is an unambiguous increase in the deadweight loss.
Endogeneity: Price variation is exogenous in our application, since price was varied randomly by the experimenter. Indeed, it is still possible that wealth is correlated with , the unobserved determinants of bednet purchase. However, experimental variation in price implies also that is independent of , given . Consequently, one can invoke the argument presented in Bhattacharya (2018, Sec. 3.1; reproduced in the Appendix A.6 below for ease of reference), and interpret the estimated choice-probabilities and the corresponding welfare numbers as conditional on , and then integrating with respect to the marginal distribution of . This overcomes the problem posed by potentially endogenous income.
7 Summary and Conclusion
In this paper, we develop tools for economic demand and welfare analysis in binary choice models with social interactions. To do this, we first show the connection between Brock-Durlauf type social interaction models and empirical games of incomplete information with many players. We analyze these models under both I.I.D. and spatially correlated unobservables. The latter makes individual beliefs conditional on privately observed variables, complicating identification and inference. We show when and how these complications can be overcome via the use of a limit model to which the finite game model converges under increasing domain spatial asymptotics, in turn yielding computationally simple estimators of preference parameters. These lead to consistent point-estimates of potential values of counterfactual demand resulting from a policy-intervention, which are unique under unique equilibria.
However, with interactions, welfare distributions resulting from policy changes such as a price subsidy are generically not point-identified for given values of counterfactual aggregate demand, unlike the case without spillovers. This is true even for fully parametric specifications, and when equilibria are unique. Non-identification results from the inability of standard choice data to distinguish between different underlying latent mechanisms, e.g. conforming motives, consumer learning, negative externalities etc., which produce the same aggregate social interaction coefficient, but have different welfare implications depending on which mechanism dominates. This feature is endemic to many practical settings that economists study, including the health-product adoption case examined here. Another prominent example is school-choice, where merit-based vouchers to attend a fee-paying selective school can create negative externalities by lowering the academic quality of the free local school via increased departure of high-achieving students. The resulting welfare implications cannot be calculated based solely on a Brock-Durlauf style empirical model of individual school-choice inclusive of a social interaction term. This is in contrast to models without social interaction, where choice probability functions have been shown to contain all the information required for welfare-analysis. Nonetheless, we show that under standard semiparametric linear index restrictions, welfare distributions can be bounded. Under some special and untestable cases e.g. exactly symmetric spillover effects or absence of negative externalities, these bounds shrink to point-identified values.
We apply our methods to an empirical setting of adoption of anti-malarial bednets, using data from an experiment by Dupas (2014) in rural Kenya. We find that accounting for spillovers provides different predictions for demand and welfare resulting from hypothetical, means-tested subsidy rules. In particular, with positive interaction effects, predicted demand when including spillover is lower for less generous eligibility criteria, compared to demand predicted by ignoring spillovers. At more generous eligibility thresholds, the conclusion reverses. As for welfare, if negative health externalities are present, then subsidy-ineligibles can suffer welfare loss due to increased use by subsidized buyers in the neighborhood; if solely conforming effects are present and there is no health-related externality, then welfare can improve. Specifically, our welfare bounds applied to the bednet data show that a KSh subsidy with eligibility threshold equal to the 75th percentile of wealth has an average (across eligibles and ineligibles combined) cash equivalent of between to KSh when including spillovers; equals KSh under symmetric spillover, and about KSh when all spillovers are ignored. The potential welfare loss of ineligibles and non-buyers translates into larger estimates of potential deadweight loss from price intervention. We perform robustness checks allowing for village-level unobservables and a semiparametric specification.
The implication of these results for applied work is that under social interactions, welfare analysis of potential interventions requires more information regarding individual channels of spillover than knowledge of solely the choice probability functions (inclusive of a social interaction term). Belief-eliciting surveys provide a potential solution.
We conclude by noting that we have used the basic and most popular specification of interactions, viz. that physical neighbors constitute an individual’s peer group. This also seems reasonable in the context of our application, which concerns adoption of a health product in physically separated Kenyan villages. It would be interesting to extend our analysis to other network structures, e.g. those based on ethnicity, caste, socioeconomics distance, etc. We leave that to future work.
Appendix A Appendix
This Appendix has seven sections labelled A.1 - A.7. They deal respectively with the proof of constancy and symmetry of the beliefs with I.I.D. unobservables, belief convergence with spatially correlated unobservables, sufficient conditions for contraction, convergence of the estimators (the proof of Theorem 2), welfare analysis under , income endogeneity, and nonparticipating households.
A.1 Proofs for the (Conditionally) I.I.D. Case
Proof of Proposition 1. By the definition in (2) (with replaced by ), . Since this is the average of the conditional expectations given , we can write ’s belief as
[TABLE]
using some function which may depend on each index but is deterministic (non-random). Thus, plugging this expression of into , we can also write
[TABLE]
for some deterministic function , where .
By C3-IID, we have the two of the conditional independence restrictions: and . These imply that
[TABLE]
where we have used the following conditional independence relation: for random objects , , and ,
[TABLE]
which is applied with , , and . By the same token, **C3-IID **implies that
[TABLE]
which is equivalent to
[TABLE]
We below denote by the conditional expectation operator given (i.e., ; we also write for any random variable). Given the above, we have
[TABLE]
where the first equality uses (73), the second and third equalities follow from (74) and (76), respectively, the fourth equality holds since , completing the proof.
Proof of Proposition 2. Let
[TABLE]
where henceforth we suppress the dependence of on for notational simplicity. By Proposition 1 and (6), we have
[TABLE]
Given these, we can write
[TABLE]
We can easily see that if a symmetric solution to the system of equations in (79) exists uniquely, then that of (7) (in terms of ) also exists uniquely (vice versa; note that by (78)). Therefore, we investigate (79).
Corresponding to (79), define an -dimensional vector-valued function of as
[TABLE]
where we write for notational simplicity, and the metric in the domain and range spaces of is defined as
[TABLE]
for any , (note that both the spaces are taken to be ). Given these definitions of and the metric, we can easily show that the contraction property of carries over to , i.e.,
[TABLE]
which implies that there exists a unique solution to the (-dimensional) vector-valued equation:
[TABLE]
Now, consider the following scalar-valued equation . By the contraction property (9), it has a unique solution. Denote this solution by . By the definition of , the vector must be a solution to (80). Then, by the uniqueness of the solution to (80), this must be a unique solution, which is a set of symmetric beliefs. The proof is completed.
A.2 The Spatially Dependent Case
In this section, we present formal specifications for the spatially dependent process and derive the belief convergence result. We prove Theorem 5 below, which is a finer, more general version of Theorem 1 in Section 2 in that it also derives the rate of convergence without the assumption of symmetric beliefs.
Note that given **C1 **(independence over villages), each village may be analyzed separately. So for notational simplicity, we drop the village index , i.e. write instead of . All of the conditions and statements here should be interpreted as conditional ones given for each village , where we note that C2 and **C3-SD **are stated conditionally on .
To avoid any notational confusion, we re-write C2 and **C3-SD **in the following simplified forms (without the village specific effects and village index ):
C2’
is I.I.D. with .
C3-SD’
is defined through , where is a stochastic process on with the following properties: i) is alpha-mixing satisfying Assumption 3 (provided below); ii) is independent of .
A.2.1 Spatially Mixing Structure
Now, we provide additional specifications of modelled as a spatially dependent process. To this end, we introduce some more notation. For a set , let be the sigma algebra generated by and define
[TABLE]
where the supremum is taken over any events and . This measures the degree of dependence between two algebras; it is zero if any and are independent. We also define
[TABLE]
the collection of all finite disjoint unions of squares, , in with its total volume not exceeding , where stands for the volume of each square . Given these, we define alpha- (strong) mixing coefficients of the stochastic process by
[TABLE]
where is the distance between two sets: , stands for the -distance between two points in : for and .212121For the verification of Theorem 5 below, this definition of the mixing coefficients using is slightly more complicated than necessary. We maintain this definition, however. It is the same as the one used in Lahiri and Zhu (2006), and they howed validity of a spatial bootstrap under this definition and some mild regularity conditions. We suppose is decreasing in (and increasing in ). In particular, the decreasingness of in implies that and are less correlated when is large, i.e. the process is weakly dependent when the mixing coefficients decay to zero as tends to infinity.
For location variables , we consider the following increasing-domain asymptotic scheme, which roughly follows Lahiri (1996). We regard as a ‘prototype’ of a sampling region (i.e., village), which is defined as a bounded and connected subset of , and for each , we denote by a sampling region of the village that is obtained by inflating the set by a scaling factor maintaining the same shape, such that
[TABLE]
In particular, if contains the origin , we can write , which may be assumed WLOG. It is also assumed that is contained in a square whose sides have length , WLOG. Thus, the area of is equal to or less than . We let be the probability density on , and then for ,
[TABLE]
where the dependence of on is suppressed for notational simplicity.222222Note that when does not contain the origin, we need to consider some location shift: instead of (84), where is some point in such that the region ‘’ (shifted by ) contains the origin. Given these, we have , and the expected number of households residing in a region is
[TABLE]
We can also compute the expected distance of two individuals with and :
[TABLE]
using changing variables with and . Since the second term on the last line is a finite integral (independent of ), which exists under , *the average distance between any and grows at the rate of *. This sort of growing-average-distance feature is key to establishing limit theory for spatially dependent data under the weakly dependent (mixing) condition above. We discuss this point and its implications below after introducing Assumption 3.
Now, we state the following additional conditions on the data generating mechanism:
Assumption 3
(i) The stochastic process is alpha-mixing with its mixing coefficients satisfying
[TABLE]
for some constants, and , where is defined in (82). (ii) Let be an I.I.D. sequence introduced in C2’. Each defined through (84) is continuously distributed with its support (defined through ) and probability density function, , satisfying .
Condition (i) controls the degree of spatial dependence of , which is a key for establishing limit (LLN/CLT) results. The same condition is used in Lahiri and Zhu (2006), and some analogous conditions are also imposed in other papers such as Jenish and Prucha (2012). (ii) is the increasing-domain condition, and is important for establishing consistency of estimators (Lahiri, 1996). The uniform boundedness of the density is imposed for simplifying proofs, but can be relaxed at the cost of a more involved proof.
Conditions (i) and (ii) have an important implication for identification and estimation of our model: Given the increasing-domain condition (ii), the distance between two of individuals, and , on average, increases with the rate as , as in (85). This implies that, given the weak dependence condition (i), the correlation between two variables, and , for any and , becomes weaker as tends to . In other words, for each , the number of other individuals who are almost uncorrelated with tends to and, furthermore, the ratio of such individuals (among all players) tends to . That is, the conditional law of and that of are less affected by for larger , and thus converges to . We formally verify this convergence result in Theorem 5.
Note that such convergence is not specific to our specification of the data-generating mechanism, but it occurs generically in settings with spatial data. For example, Jenish and Prucha (2012) derive various limit results for spatial data (or random fields) under the increasing-domain assumption and the so-called minimum distance condition , where the latter means that the distance between any two individuals is larger than some fixed constant (independent of ).232323Note that our increading-domain assumption (together with the specification of the density of ) implies that for any , ,
where the convergence holds as the area of shrinks to zero and is uniformly bounded; thus for any , we have the minimum distance condition with probability approaching . These two assumptions imply that the number of individuals who are ‘far away’ from each tends to . This, together with the mixing condition as in (i) of Assumption 3, drives the convergence of conditional expectations.
Before concluding this subsection, we present the following Assumption 4 under which Theorem 1 in Section 2 is verified. This is a multi-village version of Assumption 3 in which we allow for and (and thus ):
Assumption 4
(i) For each , given , the stochastic process is alpha-mixing with its mixing coefficients satisfying for some constants , , and , where the definition of follows (82). (ii) For each , given , let be the conditionally I.I.D. sequence introduced in C2. Each is continuously distributed with its support and PDF satisfying , where is a ‘prototype’ sampling region for each village and is a scaling constant with for some .
A.2.2 Convergence of Equilibrium Beliefs
To formally state our belief convergence result, we introduce the following functional operator that maps a -valued function to some constant in :
[TABLE]
where is independent of by the (conditional) I.I.D.-ness of () and the independence between and , imposed in C2’ and C3-SD’. If were I.I.D., the equilibrium beliefs would be characterized as a fixed point of this (as clarified through Propositions 1 and 2). While beliefs are given as conditional expectations under the spatial dependence of unobserved heterogeneity as modelled in C3-SD’ they are still characterized through in an asymptotic sense stated below.
To show this, we introduce the following mapping to characterize the beliefs under C3-SD’ for each . Let be an -dimensional vector valued function, each element of which is a -valued function on the support of . Then, define as a functional mapping from to an -dimensional random vector:
[TABLE]
where each is a mapping from to a -valued random variable defined as
[TABLE]
Note that corresponds to individual ’s belief (this is written as in Section 2 where multiple villages are considered), when predicts other ’s behavior using . Therefore, in the equilibrium, the system of beliefs,
[TABLE]
is given as that satisfies the fixed point restriction:
[TABLE]
almost surely, where we write , a vector of function; note that each element of the solution, , depends on but we suppress this for notational simplicity.
Note that (87) may be equivalently written in the following coordinate-wise form:
[TABLE]
The next theorem states the convergence of each to a unique fixed point of , which is a constant :
Theorem 5** (Convergence of beliefs under spatial correlation)**
Suppose that C2’ and C3-SD’ hold with Assumption 3, and the functional map defined in (86) is a contraction with respect to the metric induced by the norm ( is a -valued function on the support of ), i.e.,
[TABLE]
Let be a (unique) solution to the functional equation . Then, it holds that for any solution to the functional equation (87), which may not be unique,
[TABLE]
where is some constant (independent of , , and ), whose explicit expression is provided in the proof, and thus
[TABLE]
An important pre-requisite of Theorem 5 is that the mapping is a contraction. This condition is easy to verify, e.g., see Section A.3 for a sufficient condition for the contraction property under a linear-index restriction on the utilities. Roughly speaking, we can show that is a contraction if the extent of social interactions is not ‘too large’.
The contraction property of the unconditional expectation operator implies uniqueness of its fixed-point, the conditional expectation operators need not be a contraction and may admit multiple fixed points (i.e., multiplicity of equilibria). The theorem states each of the non-unique equilibrium beliefs in each -player game converges to the unique fixed point of . In examples, existence of a fixed-point solution of is relatively easy to check, but its uniqueness or contraction property may not be; indeed, verification of the latter may require an appropriate specification of joint distributional properties of as the operator is based on conditional expectations.
Theorem 5 provides the rate of convergence of equilibrium beliefs in (88). Using this result, if the degree of spatial dependence is not too strong with , then, we can strengthen the belief convergence result to the uniform one:
[TABLE]
since as specified in (83).
Proof of Theorem 5. Define a functional mapping from an -dimensional vector valued function to :
[TABLE]
where is defined in (86) (as a mapping on scalar valued functions), and each is a -valued function on the support of ). Based on this , we also define an -dimensional vector mapping:
[TABLE]
We also write , the -dimensional vector each element of which is . Then, since is a fixed point of (i.e., ), it obviously holds that
[TABLE]
Now, since solves the functional equation:
[TABLE]
where maps an -dimensional vector valued function to an -dimensional random vector.
Given (90) and (91), we can see that
[TABLE]
Thus, by the triangle inequality and the contraction property of , we have
[TABLE]
By the definition of in (89) as well as that of , the second term on the majorant side is bounded by
[TABLE]
where the last inequality follows from the contraction condition on . Thus, this bound and (92) lead to
[TABLE]
Therefore, if it holds that
[TABLE]
for some constant independent of , where the supremum is taken over any (Borel measurable) functions, , then the desired result (88) holds with .
Proof of (93). For notational simplicity, we write
[TABLE]
for an arbitrary function, . Then, the inequality (93) follows if
[TABLE]
where the supremum is taken over any (Borel measurable) functions, .
To show this inequality, observe that by (ii) of C3-SD’,
[TABLE]
Here, we recall the following result on independence: for random objects , , and ,
[TABLE]
Applying this with , , and , since C2’ implies that , we can obtain
[TABLE]
which in turn implies that
[TABLE]
The relation (96) also leads to
[TABLE]
for any . Then, we can compute the conditional expectation in (94) as
[TABLE]
where the first and third equalities have used (97) and (98), respectively.
Now, we look at the maximand on the LHS of (94):
[TABLE]
where is the expectation that only concerns ; the first equality uses (105) and the independence of and ; the second equality again uses the same independence condition (i.e., and thus ); the third equality holds since
[TABLE]
by the independence of and , and the last inequality uses the Fubini theorem.
To bound the RHS of (106), note that for , we can always construct two sets on , and satisfying 1) the former contains and the latter contains , 2) the distance between the two sets is larger than , 3) Each of and is a square in with its area less than . and are measurable with respect to and , respectively. Then, noting the definition of mixing coefficients of in (81) and (82), these 1)
- 3) allow us to apply McLeish’s mixingale inequality (p. 834 of McLeish, 1975; or Theorem 14.2 of Davidson, 1994) and derive its bound in terms of . That is, since is uniformly bounded , we obtain
[TABLE]
uniformly over any , , and .
To find an upper bound of the majorant side of (106), recall that the (marginal) distribution function (whose support is given by ) has the density for each , and also that by the definition of the mixing coefficients in (81) and (82), uniformly over any , . Then, plugging (107), we have
[TABLE]
where , the last inequality holds since
[TABLE]
by changing variables, and for ,
[TABLE]
Thus, we can see that this upper bound of (106) is independent of , , and , and thus the inequality (94) holds with , completing the proof.
A.3 Sufficient Conditions for Contraction
Here, we investigate the contraction property of (defined in (29)) as well as its limit operator:
[TABLE]
is a functional operator from a -valued function to a constant . This limit operator is used investigate convergence properties of the estimators. We impose the following conditions:
Assumption 5
(i) For any , and the density of the conditional CDF satisfies
[TABLE]
*where and denote location indices associated with and , respectively, stands for the distance, and the interval is the set of possible values of (introduced in Assumption 7).
(ii) The conditional CDF satisfies*
[TABLE]
for any and any , if .
These conditions are used to verify the so-called Blackwell sufficient conditions (c.f. Theorem 3.3 of Stokey and Lucas, 1989: I). The non-negativity of is used for the monotonicity. While (110) is a condition for the conditional density, it also implies the same condition for the marginal density:
[TABLE]
since (recalling that is defined as the CDF of and is that of , it holds that ). Condition (ii) means that first-order stochastically dominates , implying that any two of (spatially dependent) variables, and , are (weakly) positively correlated, which is also conveniently used to show the monotonicity of .
Given these preparations, we can show the contraction properties of and :
Proposition 3
*Suppose that (i) of Assumption 5 holds. Then, is a contraction in the space of -valued functions on , , each of which are nondecreasing in , equipped with the sup metric, where denotes the support of the random variable .
b) Suppose that Assumption 5 hold. Then, is a contraction in the same space.*
The restriction for being nondecreasing-ness is innocuous when considering fixed points of and . This is because, given the non-negativity of and the stochastic-dominance of , the fixed points are also nondecreasing in (since
and are also nondecreasing in for such a nondecreasing).
In this proposition, we have defined the limit operator on the set of general functions, , which may depend on . This general domain space is required to consider the convergence of the operator and its fixed point. However, if we define the limit operator only on the restricted space of functions, , each of which is independent of , we can write
[TABLE]
since . In this case, by the Lipschitz continuity of , we can check the contraction property of on the restricted space under
[TABLE]
Note that in the probit specification in which is supposed to follow the standard normal, ; and the logit specification, .
Proof of Proposition 3. First, we investigate by using the Blackwell sufficient conditions. Since , we have for any two functions with , implying the monotonicity condition. II) For a constant ,
[TABLE]
Since is nondecreasing in and , is strictly increasing in . Thus, we can find a unique satisfying
[TABLE]
for each . For each , let be a unique number satisfying
[TABLE]
Since and the slope of the function is greater than or equal to , we must have and . This upper bound of holds for any . Thus,
[TABLE]
Therefore, if (110) holds, the so-called discounting condition is satisfied. Therefore, given I) and II), we have verified is a contraction.
Next, we investigate . Note that since is nondecreasing in , so is , and given (ii) of Assumption 5, the mapped function is also nondecreasing. Therefore, the domain and range spaces of can be taken to be identical. We can also check the Blackwell sufficient conditions for exactly in the same way as for , implying the desired contraction property.
A.4 Proof of Theorem 2 (the Estimators’
Convergence)
Here, we prove Theorem 2 through several lemmas. In Section 3, for ease of exposition, we assumed that the village-fixed effects are known to the econometrician. Here, we explicitly include them in the parameter to be estimated. Note also that identification of preference parameters in presence of s requires identification of the s themselves; hence we need to use one of the methods for doing so, as described in Section 4.4. Here we use the homogeneity assumption ; an alternative proof can be given for the correlated random effects case. To sum up, for this section, we re-define the eventual parameter as (see e.g. Assumption 7), with all other related quantities interpreted analogously. Consistency of the estimators for the case with known is a simpler corollary of Theorem 2.
To analyze and , we define the following conditional moment restriction:
[TABLE]
where is a hypothetical outcome variable based on the limit model242424Recall that has been defined through the conditional moment restriction (32) for the observed variables generated from the finite-player game ( is generated from (28) or equivalently (30)). may also be defined as the one satisfying restriction(111), which is correctly specified for the variables (hypothetically) generated from the limit model, .:
[TABLE]
For each , let , where this limit ratio value is supposed to be in (note that ). We also consider the limit versions of and ,
[TABLE]
respectively, where in is defined as a solution to (40) for each , and in is defined as the (probability) limit of (note that the limits of and coincide, which follows from arguments analogous to those in the proof of Lemma 3). The first order condition of may be seen as an unconditional moment restriction based on the conditional one (111).
Note that given the continuity of , and are continuous in . Lemma 3 shows the uniform convergence of to in probability over ; we can also show that of to in probability over (the proof this result is analogous to that of Lemma 3, and is omitted).
Given the limit objective function, we let
[TABLE]
Lemma 2 shows identification of (i.e., it is a unique maximizer of over ) and the same result as for . As a result, by Theorem 2.1 of Newey and McFadden (1994), given the compactness of the parameter space , we obtain
[TABLE]
Since Lemma 2 also shows that under the correct specification, we have .
By Lemma 4, we have , which, together with Lemma 3, implies that
[TABLE]
This in turn means that (by using Newey and McFadden’s Theorem 2.1 again). These lead to the conclusion of the theorem.
A.4.1 Identification Results: Lemmas 1 -
In this subsection, we investigate identification of and (defined in (113) and (114), respectively). To this end, we impose the following conditions:
Assumption 6
(i) Let and
[TABLE]
*and the (marginal) CDF of is for each , whose functional form is supposed to be known, and is strictly increasing on with its continuous PDF satisfying .
(ii) The random vector includes no constant component. The support of is not included in any proper linear subspace of , where is the dimension of .*
Assumption 6 is quite standard. The condition in (i) on the support of may be relaxed, allowing for some bounded support (instead of ), but it simplifies our subsequent conditions and proofs and thus is maintained.
Assumption 7
(i) Let be the probability limit of . It holds that
[TABLE]
(ii) Denote by a generic element in the parameter space . is a compact subset of such that
[TABLE]
*where is a compact subset of in which lies and is a closed rectangular region of (with some ) in which lies.
(iii) For any ,*
[TABLE]
(iv) Let be an element of . Given this (fixed), for any , it holds that
[TABLE]
where stands for , a unique solution to the fixed point equation, ( with ).
Assumption 7 (i) leads to different ‘constant’ terms for under the homogeneity assumption (), i.e.,
[TABLE]
This is required for identification of in through the Brock-Durlauf type objective function .
Conditions (ii) - (iv) are used for identification of via . The rectangularity of the parameter space for imposed in (ii) is a technical requirement when using Gale and Nikaido’s (1965) result for univalent functions (see their Theorem 4 and our proof of Lemma 1). The restriction on in (116) in (iii) guarantees the contraction property of the fixed point problem (see discussions in Appendix A.3). As for (iv), since and in are fixed points, we can equivalently re-write (117) as
[TABLE]
This is an extension of (115) to the model-based probabilities for all in the parameter space, where we note that (117) implies (115) under (111) since . Note that if , we may suppose (118) without loss of generality. That is, if , we may re-label the indices to secure ””.
The inequality (117) does not impose any substantive restriction. For example, if and the (marginal) distribution of is first-order stochastically dominated by that of , then the fixed point solutions satisfy and thus (117) for any (since is strictly increasing), where any restriction on (except for the maintained one: ) is imposed.
Now, we are ready to establish the identification properties of and :
Lemma 1** (Global identification)**
*Suppose that Assumption 6 holds.
(a) Further if (i) of Assumption 7 holds, then for any ,*
[TABLE]
*for some with positive probability, if and only if , where and .
(b) Denote by any element in . Further if (ii) - (iv) of Assumption 7 are satisfied, in which (iv) is satisfied with of this , then for ,*
[TABLE]
for some with positive probability, if and only if , where and .
The result of this lemma allows us to establish (global) identification of and based on their limit objective functions, and . Note that this result does not presuppose the correct specification of model-implied conditional choice probabilities as in (111). However, given (111) with , our identification analysis based on the objective functions can be done analogous to that for ML estimators in the standard I.I.D. case (as in Lemma 2.2 and Example 1.2 of Newey and McFadden, 1994, pages 2124-2125), which is due to the form of our objective functions, while they are not full ML functions. We summarize the objective-function-based identification result as follows:
Lemma 2
Suppose that satisfies the conditional expectation restriction (111), and Assumptions 4-7 hold, where (iv) of Assumption 7 holds with in this . Then, is a unique maximizer of in and it is also a unique maximizer of in .
While and (introduced in (113) and (114), respectively) may differ in general, this lemma states that they are identical if we suppose the correct specification, under which we will identify them and always write hereafter.
A.4.2 Uniform Convergence Results: Lemmas 3 -
In this subsection, we establish uniform convergence for the objective functions using the following conditions:
Assumption 8
*(i) For any , the support of is included in , a bounded subset of .
(ii) Let be the conditional probability density of given ’s location and ’s variables (parametrized by ) satisfying*
[TABLE]
where are constants (independent of and ); is the same constant introduced in Assumption 4 (the majorant side is defined as [math] if ).
Assumption 8 (ii) can be derived from a spatial analogue of the so-called strong Doeblin condition used in Markov chain theory (see, e.g., Theorem 1 of Holden, 2000), which can be satisfied by various parametric models. It is a strengthening of the alpha-mixing condition in (i) of Assumption 4.
Lemma 3
*Suppose that C1 - C2, **C3-SD, *(i) of Assumption 6, (ii) - (iii) of Assumption 7, Assumption 4 - 8 hold. Then,
[TABLE]
Lemma 4
*Suppose that C1 - C2, **C3-SD, *Assumption 5, (i) of Assumption 6, (ii) - (iii) of Assumption 7, Assumptions 4 - 8 hold. Then, for each ,
[TABLE]
and
[TABLE]
A.4.3 Proofs of Lemmas 1 - 4
Proof of Lemma 1. The proof of the result (a) is standard and is omitted. Here, we focus on (b). For ease of exposition, we let , as in our empirical application and set . The proof for any other can be done in exactly the same way. We let and define analogously. Since is strictly increasing, (120) is equivalent to
[TABLE]
with positive probability. We can immediately see that this (122) implies that . Now, supposing that , we shall derive (122). To this end,, we consider the following five cases: 1) If , (122) holds with positive probability by (i) of Assumption 4, regardless of the equality for the other (constant) terms (i.e., is equal to or not). 2) If and , we must have , implying (122). 3) If , , , and , we must at least have by (117) of Assumption 7 and thus , which implies (122).
-
For the case with , , , and , we suppose in contradiction that for any . Then, and , since , and thus . However, this contradicts (117) of Assumption 7.
-
Finally, we consider the case with , , and . In this case, by re-parametrizing , the fixed point equations (with respect to ),
[TABLE]
can be equivalently re-written as equations with respect to :
[TABLE]
That is, if is a solution to (123), then is a solution to (124); and if solves to (124), then solves (123). We can also check the solution uniqueness of (123) is equivalent to that of (124). By this re-parametrization, given , (122) is
[TABLE]
which we shall show below. Now, to investigate (124), we define the following vector-valued (-by-) function of and as
[TABLE]
where
[TABLE]
and the dependence of and on is suppressed for notational simplicity. Given (116) of Assumption 7, using the contraction mapping theorem: for any , we can find a unique
[TABLE]
Given this function of , we consider the set of its values:
[TABLE]
Next, we compute the Jacobian matrix of with respect to :
[TABLE]
where the upper-left -by- submatrix is the identity matrix. This matrix has dominant diagonals for any in the sense of Gale and Nikaido (1965, p. 84), that is, letting , whose dependence on and is suppressed for notational simplicity, is said to have dominant diagonals if we can find strictly positive numbers such that
[TABLE]
If we set for , then (127) is reduced to
[TABLE]
and it is possible to find some since
[TABLE]
which is imposed in (117) of Assumption 7. Since has dominant diagonals for each , it is a -matrix for each in the sense of Gale and Nikaido (1965, p.84). Applying Gale and Nikaido’s Theorem 4, we can see that for each (fixed) , is univalent as a function of , i.e., holds only at a unique . Therefore, we can define a function on , i.e., the inverse function of introduced in (126). That is, we have shown that is one-to-one (injective; for ), implying the desired result (125). We have now completed Case 5) and thus the whole proof.
Proof of Lemma 2. Given the definition of in (112), observe that
[TABLE]
where the first equality follows from the law of iterated expectations and the correct specification assumption and the inequality holds by Jensen’s inequality. By the strict concavity of , this inequality holds with equality if and only if , which is equivalent to by (b) of Lemma 1. That is, we have shown that is the unique maximizer of over .
To establish the same result for , note that is the fixed point, and thus the condition (111) (that determines ) implies
[TABLE]
Therefore,
[TABLE]
meaning that the conditional choice probability model with (instead of ) is also correctly specified at . By the same arguments as in (128), we can see that is also the unique maximizer of over . The proof is completed.
Proof of Lemma 3. By boundedness of the support of and boundedness of the parameter space , is bounded away from [math] and uniformly over , , and (any realization of) , i.e., we can find some (small) constant (independent of and ) such that
[TABLE]
Thus, given the globally Lipschitz continuity of on , and that of and (see the global Lipschitz continuity result (138) in the proof of Lemma 5), as well as the uniform boundedness of , we can see that and are also globally Lipshitz continuous in , implying the global Lipschitz continuity of in .
Now, replacing in by , we define the following function:
[TABLE]
Given the uniform convergence of to (Lemma 5), by arguments analogous to those for the global Lipschitz continuity of , we can easily see that
[TABLE]
Again, given the global Lipschitz continuity of relevant functions as discussed above, we can also check the stochastic equicontinuity (SE) of (by using Corollary 2.2 of Newey, 1991) as well as the (global Lipschitz) continuity of .
Since is assumed to be compact and we have verified the (global Lipschitz) continuity of and the SE of , Theorem 2.1 of Newey (1991) implies the uniform convergence:
[TABLE]
if the pointwise convergence holds
[TABLE]
which is to be shown below. And, analogously to the proof of Lemma 7 below, we can obtain
[TABLE]
as its simpler corollary. Then, using this result and arguments quite analogous to the proof of Lemma 4 below, we also have
[TABLE]
implying that
[TABLE]
Then, by (130) and (132), we can obtain the desired conclusion of the lemma. It remains to show the pointwise convergence (131), note that each summand of is a function of , , and (since and ). Thus, letting
[TABLE]
which is uniformly bounded since (129) holds, we can apply Lemma 6 to obtain
[TABLE]
where is the limit of . This completes the proof.
Proof of Lemma 4. Let
[TABLE]
which is shown to be in Lemma 7. Then, by the definition of in (36), we have
[TABLE]
Recall also the definition of ( is the CDF of ), these lower and upper bounds can be computed as
[TABLE]
Since is Lipschitz continuous, both the bounds converge to in probability. Further, the absolute difference of the lower and upper bounds is bounded by , implying the uniform convergence of as in (121).
A.4.4 Auxiliary Lemmas and their Proofs
Lemma 5
Suppose that C2, (i) of Assumption 6, (ii) - (iii) of Assumption 7, and (i) of Assumption 8 hold. Then,
[TABLE]
Proof of Lemma 5. We below show 1) the pointwise convergence of :
[TABLE]
and 2) the continuity of the limit function and the stochastic equicontinuity of . Then, given the compactness of (by (ii) of Assumption 7), we have (for each ) by Theorem 2.1 of Newey (1991), which implies the desired result (133) since is taken over a finite set . We below show 1) and 2).
**1) **To show the pointwise convergence, we compute . To this end, define a functional mapping for each :
[TABLE]
Analogously, we define the following mapping:
[TABLE]
where the (true) CDF in is replaced by the empirical one . Since and are contraction (by (iii) of Assumption 7; see also discussions in Appendix A.3), we can find and , unique fixed points of and , respectively, for each . By the I.I.D.-ness of in C2,
[TABLE]
where the last inequality holds since is the CDF and . Therefore, we have shown that
[TABLE]
where the supremum is taken over any -valued function on .
Noting that and are fixed points, by the triangle inequality, we have
[TABLE]
which, together with (135), implies that
[TABLE]
This implies the desired pointwise convergence (134).
**2) **To verify the continuity of , observe that for ,
[TABLE]
Using the triangle inequality, we have the following upper bound of the last term in the curly braces:
[TABLE]
By combining (136) and (137), we obtain
[TABLE]
Since we can find some such that for any (by (ii) of Assumption 7), this inequality leads to
[TABLE]
where is some positive constant, whose existence follows from (i) of Assumption 8. That is, we have shown that is (globally Lipschitz) continuous in . We can also show that
[TABLE]
where is some random variable independent of ; note that (139) can be derived in the same way as (138) with replaced by . This (139) implies the stochastic equicontinuity of by Corollary 2.2 of Newey (1991). The proof of Lemma 5 is completed.
Lemma 6
*Suppose that C1, C2, **C3-SD, *and Assumption 4 hold. Then, let be a function of , , and that is uniformly bounded (and measurable) with
[TABLE]
where is some positive constant (independent of ). Then, for each ,
[TABLE]
Proof of Lemma 6. Recall that . Since is alpha-mixing, we apply Billingsley’s inequality (Corollary 1.1 of Bosq, 1998) to
[TABLE]
By the so-called conditional-covariance decomposition formula, we have
[TABLE]
The second term on the RHS of (141) is zero since and the conditional expectations are reduced to
[TABLE]
which follow from the conditional independence relation as in (97) (in the proof of Theorem 5). Thus, by the covariance bound given in (140), we have
[TABLE]
uniformly over any and , where the last equality follows from the same arguments as for (108) (in the proof of Theorem 5). Using these, we can compute
[TABLE]
which completes the proof of Lemma 6.
Lemma 7
Suppose that Assumptions 5 and 8 hold. Then, it holds that
[TABLE]
Proof of Lemma 7. Recall that is a fixed point of the functional mapping defined in (38) and is a fixed point of
[TABLE]
Note that this is a contraction (by Proposition 3) which does not depend on (the dependence of on is only through that of ), and its fixed point is also independent of ; thus, we write (instead of ). By the triangle inequality,
[TABLE]
where the last inequality holds with some (by Proposition 3) that is independent of and any realization of random variables. Thus,
[TABLE]
where the (outer) supremum is taken over any -valued functions. We now show this majorant side is . To this end, observe that
[TABLE]
where the second inequality follows from Assumption 8, and this upper bound is independent of , , , and . Since is the empirical distribution function of the I.I.D. variables , we have
[TABLE]
By the same arguments as those for (108) (in the proof of Theorem 5), we have
[TABLE]
Therefore,
[TABLE]
which is for since . This completes the proof of Lemma 7.
A.5 Welfare Analysis: The case of
Eligibles: Recall eq. (46)
[TABLE]
Now, if
[TABLE]
then each term on the LHS is smaller than the corresponding term on the RHS. If, on the other hand,
[TABLE]
then each term on the LHS is larger than the corresponding term on the RHS. This gives us
[TABLE]
In the intermediate case,
[TABLE]
we have that if , then
[TABLE]
and if , then
[TABLE]
Putting all of this together, we have that
Proposition 4
Suppose that Assumptions 1, 2 and the linear index structure hold and . Then, for each , if , then
[TABLE]
and if , then
[TABLE]
Ineligibles: Recall eq. (45)
[TABLE]
Now if , then each term on the LHS is smaller than the corresponding term on the RHS for each realization of the s. So the probability is [math]. Similarly, for , the probability is 1. Finally, for , the above inequality is equivalent to
[TABLE]
Thus we have that:
Proposition 5
Suppose that Assumptions 1, 2 and the linear index structure hold and . Then, for each ,
[TABLE]
A.6 Income Endogeneity
(Summarized from Bhattacharya, 2018, Sec 3.1): Observed income may be endogenous with respect to individual choice, e.g. when omitted variables, such as unrecorded education level, can both determine individual choice and be correlated with income. Under such endogeneity, the observed choice probabilities would potentially differ from the structural choice probabilities, and one can define welfare distributions either unconditionally, or conditionally on income, analogous to the average treatment effect and the average effect of treatment on the treated, respectively, in the program evaluation literature. In this context, an important and useful insight, not previously noted, is that for a price-rise, the distribution of the income-conditioned EV is not affected by income endogeneity; for a fall in price, the conclusion holds with CV instead of EV.
To see why that is the case, recall the binary choice setting discussed above, and define the conditional-on-income structural choice probability at income as
[TABLE]
where denotes the distribution of the unobserved heterogeneity for individuals whose realized income is , where may or may not equal . Now, given a price rise from to , for a real number , satisfying , the distribution of equivalent variation (analogous to compensating variation for a fall in price as in a subsidy) at , evaluated at income , conditional on realized income being , is given by (see Bhattacharya, 2015)
[TABLE]
Now, , by definition, is the fraction of individuals currently at income who would choose alternative 1 at price , had their income been . Now if prices are exogenous in the sense that , then the observable choice probability conditional on price and income is given by
[TABLE]
Therefore, (154) equals , so no corrections are required owing to endogeneity. This implies that if exogeneity of income is suspect and no obvious instrument or control function is available, then a researcher can still perform meaningful welfare analysis based on the EV distribution at realized income, provided price is exogenous conditional on income and other observed covariates. For a fall in price, as induced by a subsidy, the same conclusion holds for the compensating variation which we have calculated in our application. Furthermore, one can calculate aggregate welfare in the population by integrating over the marginal distribution of income.
A.7 Nonparticipating Households
We note that in our field experiment conducted over eleven villages in West Kenya, a subset of households in each village is participating in the game, and our sample does not cover all village members. This might potentially cause a problem since selected households might interact with non-selected ones but we do not have any data about the latter. However, at the time of the experiment, non-selected households did not have any opportunity to buy an ITN and the outcome variables for such households are always zero, whose conditional expectations are zero as well. Thus, in our specification, even if we allow for interactions among all the village members (who are selected or non-selected by us), it is easy to do the necessary adjustments in the empirics.
To see this point, we interpret the index as representing any of selected and non-selected households, i.e., where is the number of all households in village (thus, ), and define as a variable to denote the outcome of any village members, i.e., if is selected in the experiment, and otherwise . Corresponding to , let be ’s belief defined as
[TABLE]
which is the average of the conditional expectations over all the households in village . By the definition of , we can easily see that
[TABLE]
which is a scaled version of . Even if ’s behavior is affected by non-selected households, i.e., it is determined by (1) but with being replaced by , its difference from the previous case is only the scaling by . In our empirical setting, this ratio is , and we apply this adjustment throughout the analysis.
References for the Appendix
Gale, D. & Nikaido, H. (1965) The Jacobian matrix and global univalence of mappings, Mathematische Annalen 159, 81-93.
Holden, L. (2000) Convergence of Markov chains in the relative supremum norm. Journal of Applied Probability 37, 1074-1083.
Jenish, N. & Prucha, I.R. (2012) On spatial processes and asymptotic inference under near-epoch dependence. Journal of Econometrics 170, 178-190.
Lee, L.-F. (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models, Econometrica 72, 1899-1925.
Newey, W.K. (1991) Uniform convergence in probability and stochastic equicontinuity, Econometrica 59, 1161-1167.
Newey, W.K. & McFadden, D. (1994) Large sample estimation and hypothesis testing, Handbook of Econometrics, Vol. IV (Ed. R.F. Engle and D.L. McFadden), Ch. 36, pages 2111-2245, Elsevier.
Stokey, N.L. & Lucas, Robert E. Jr. (1989) Recursive Methods in Economic Dynamics, Harvard University Press.
Varin, C., Reid, N., & Firth, D. (2011) An overview of composite likelihood methods, Statistica Sinica 21, 5-42.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrews, D.W. (2005) Cross-section regression with common shocks. Econometrica 73, 1551-1585.
- 2[2] Bhattacharya, D. (2008) A Permutation-based estimator for monotone index models. Econometric Theory 24, 795-807.
- 3[3] Bhattacharya, D. (2015) Nonparametric welfare analysis for discrete choice. Econometrica 83, 617-649.
- 4[4] Bhattacharya, D. (2018) Empirical welfare analysis for discrete choice: Some general results. Quantitative Economics 9, 571-615.
- 5[5] Blundell, R. and J. Powell (2004). Endogeneity in nonparametric and semiparametric regression Models, in Advances in Economics and Econometrics, Cambridge University Press, Cambridge, U.K.
- 6[6] Brock, W.A. & Durlauf, S.N. (2001 a). Discrete choice with social spillover. Review of Economic Studies 68, 235-60.
- 7[7] Brock, W.A. & Durlauf, S.N. (2001 b) Interactions-based models. Handbook of econometrics (Vol. 5, pp. 3297-3380). Elsevier.
- 8[8] Brock, W.A. & Durlauf, S.N. (2007) Identification of binary choice models with social interactions. Journal of Econometrics 140, 52-75.
