A Decomposition Analysis of Diffusion Over a Large Network
Kyungchul Song

TL;DR
This paper introduces a decomposition method to accurately measure diffusion over large networks, accounting for confounding covariates, and provides an inference procedure validated through Monte Carlo simulations.
Contribution
It develops a novel decomposition analysis and asymptotic inference method for diffusion measurement that controls for omitted covariates in network data.
Findings
Decomposition method effectively isolates true diffusion effects.
Inference procedure performs well in small samples.
Application clarifies the role of covariates in diffusion estimates.
Abstract
Diffusion over a network refers to the phenomenon of a change of state of a cross-sectional unit in one period leading to a change of state of its neighbors in the network in the next period. One may estimate or test for diffusion by estimating a cross-sectionally aggregated correlation between neighbors over time from data. However, the estimated diffusion can be misleading if the diffusion is confounded by omitted covariates. This paper focuses on the measure of diffusion proposed by He and Song (2022), provides a method of decomposition analysis to measure the role of the covariates on the estimated diffusion, and develops an asymptotic inference procedure for the decomposition analysis in such a situation. This paper also presents results from a Monte Carlo study on the small sample performance of the inference procedure.
| B-A Graph | Contact Network | Observed Graph | |||
|---|---|---|---|---|---|
| max. deg. | 14 | 33 | 27 | 95 | |
| ave. deg. | 1.0667 | 1.2633 | 2.9373 | 4.3488 | |
| cluster | 0.0275 | 0.0129 | 0.1110 | 0.0991 | |
| ADM | 0 | 0 | 0 | 0.0374 | 0.0362 | 0.0336 | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0.0394 | 0.0373 | 0.0333 | ||
| 0.2222 | 0.1625 | 0.0486 | 0.2387 | 0.1765 | 0.0549 | ||
| 0.3282 | 0.2817 | 0.0603 | 0.3684 | 0.3151 | 0.0701 |
| Cov. Prob. at 99% | 0.9738 | 0.9777 | 0.9566 | 0.9798 | 0.9771 | 0.9573 | |
|---|---|---|---|---|---|---|---|
| 0.9957 | 0.9891 | 0.9867 | 0.9963 | 0.9903 | 0.9849 | ||
| Cov. Prob. at 95% | 0.9455 | 0.9308 | 0.9035 | 0.9435 | 0.9286 | 0.9038 | |
| 0.9834 | 0.9570 | 0.9476 | 0.9842 | 0.9584 | 0.9431 | ||
| Cov. Prob. at 90% | 0.9001 | 0.8764 | 0.8424 | 0.8991 | 0.8783 | 0.8422 | |
| 0.9670 | 0.9150 | 0.8994 | 0.9696 | 0.9149 | 0.8892 | ||
| Median CI Length | 0.3099 | 0.2440 | 0.0915 | 0.3287 | 0.2335 | 0.1001 | |
| 0.3836 | 0.2102 | 0.0672 | 0.4270 | 0.2281 | 0.0753 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTheoretical and Computational Physics · Electoral Systems and Political Participation · Statistical Methods and Inference
A Decomposition Analysis of Diffusion
Over a Large Network
Kyungchul Song
*Vancouver School of Economics, University of British Columbia
Abstract.
Diffusion over a network refers to the phenomenon of a change of state of a cross-sectional unit in one period leading to a change of state of its neighbors in the network in the next period. One may estimate or test for diffusion by estimating a cross-sectionally aggregated correlation between neighbors over time from data. However, the estimated diffusion can be misleading if the diffusion is confounded by omitted covariates. This paper focuses on the measure of diffusion proposed by He and Song (2022), provides a method of decomposition analysis to measure the role of the covariates on the estimated diffusion, and develops an asymptotic inference procedure for the decomposition analysis in such a situation. This paper also presents results from a Monte Carlo study on the small sample performance of the inference procedure.
Key words. Diffusion over a Network; Dependency Graphs; Decomposition Analysis; Cross-Sectional Dependence
JEL Classification: C12, C21, C31
I thank Mahdi Ebrahimi Kahou for his valuable comments at the beginning of this research, and Yige Duan for excellent assistance in this research, including numerous helpful comments on this work. I also thank the Co-Editor and two anonymous referees for criticisms and suggestions. I acknowledge that this research was supported by Social Sciences and Humanities Research Council of Canada. Corresponding address: Kyungchul Song, Vancouver School of Economics, University of British Columbia, 6000 Iona Drive, Vancouver, BC, Canada, V6T 1L4. Email address: [email protected].
1. Introduction
Diffusion of people’s or firms’ choices over a social or a industrial network has drawn attention in economics, sociology, and marketing. Examples include diffusion of technology or product recommendations over social or industrial networks.(See, e.g., Conley and Udry (2010), Banerjee, Chandrasekhar, Duflo, and Jackson (2013), Leskovec, Adamic, and Huberman (2007), and de Matos, Ferreira, and Krackhardt (2014) to name but a few.)111In this paper, we focus only on the diffusion of observed state-switches such as the purchase of a product or the adoption of a technology. Hence we do not consider in this paper the diffusion of information or news over a network where the spread of the information or news does not manifest itself through certain observable choices.
Disentangling the role of covariates from the true causal effects has been a primary concern in almost every study of causal inference. For example, the propensity score method in program evaluations attempts to measure the effect of a social program after “eliminating the confounding effect” of covariates. (See Rosenbaum and Rubin (1983). See also Imbens and Wooldridge (2009) for a literature review on program evaluations.) In such situations, the role of a covariate is determined by its influence on the program participation by the individual to which the covariate belongs. However, in studies of social interactions or social networks, what matters for causal inference is the relation between covariates and outcomes not only for the sample unit that the covariates belong to, but also of their neighboring units. Such a relation arises when the network is formed based on homophily on the covariates. For example, suppose a network is formed among students roughly based on their parents’ income, so that a student from a high income family is more likely to be a friend of another student from a high income family than from a low income family. When one observes the purchase of an expensive smartphone of a particular brand by students over two periods, the correlation of purchases between friends over time does not necessarily indicate diffusion of purchases over the network; this can merely be due to the fact that the purchases mostly come from students from high-income families.
Spurious diffusion caused by covariates has received attention in the literature. For example, Aral, Muchnik, and Sundararajan (2009) attempt to distinguish influence-based contagion and homophily-based diffusion. They find that the peer influence is generally overestimated when homophily effect is ignored. Shalizi and Thomas (2011) point out challenges arising from the confounding of social contagion, homophily, and the influence of individual traits. It is not hard to see that failing to condition on covariates that play a crucial role in network formation through homophily and individual’s decisions would lead to bias in the measurement of diffusion.
This paper’s goal is to develop a method for a decomposition analysis that can be used to gauge the significance of covariates in measuring diffusion. The main idea is analogous to the idea of using a Hausman test to check the omitted variable bias in a linear regression model. Suppose that the parameter of interest is a coefficient in the linear regression model, and one would like to see whether there is any impact of omitting a subset of regressors on the estimated parameter of interest. For this, one can compare the two estimated parameters, one with all the regressors included and the other with a subset of regressors omitted, to see the role of the omitted regressors.
In the same spirit, in this paper we define relational diffusion as follows:222I thank Peter Phillips for suggesting this terminology. Relational diffusion captures the cross-sectional dependence of outcomes that is either due to their diffusion caused by covariates which are related to each other causally or due to the non-causal cross-sectional dependence of covariates which influence the outcomes.
[TABLE]
that is, represents the difference between the identified diffusion with omitted covariates and the true diffusion.333As will be clear later, the subscript denotes the index set of omitted covariates, i.e., is such that , is omitted, where is the -th entry of the covariate vector . The relational diffusion, , gauges the impact of omitting a covariate upon the measurement of diffusion. This will reveal whether a specific covariate is a significant source of relational diffusion. However, it is not immediately clear how to disentangle the role of covariates in this way. Here, unlike the omitted variable bias in a linear regression model, the magnitude of relational diffusion is related to the cross-sectional dependence structure of covariates. (For example, students from high income families are friends to each other.) One idea would be to compare two conditional covariances, both between observed outcomes and previous-period outcomes of their neighbors, where one is conditioned on the full set of covariates and the other on the set of covariates with the covariate of interest omitted. The main difficulty with this approach is that inference requires knowledge of the cross-sectional dependence structure among the covariates of different cross-sectional units, but this dependence structure is rarely known in practice. In many applications, there is no reason to believe that this dependence structure coincides with the network over which the diffusion arises.
In order to overcome this difficulty, we adopt the approach of Kuersteiner and Prucha (2013) and use conditional probabilities for inference, where we condition on the entire cross-section of covariates so that our inference is robust to the unknown cross-sectional dependence structure of covariates. For the concreteness of the procedure, we focus on the measure of diffusion called ADM (Average Diffusion at the Margin) which was recently introduced by He and Song (2022) and shown to be identified by a spatio-temporal dependence measure. For the analysis of relational diffusion, we decompose the spatio-temporal dependence measure constructed with some covariates omitted into the ADM and the gap (denoted by ). If the omission of the covariates causes no relational diffusion, we must have . Thus the role of covariates is determined by whether is zero or not. This paper develops asymptotic inference on for each index set of omitted covariates, and shows that it is asymptotically valid under regularity conditions. We also provide a multiple testing procedure that selects the covariates such that with the asymptotic control of the Familywise Error Rate (FWER). (See Section 9.1 of Lehmann and Romano (2005) for the definition of FWER.) This framework of decomposition analysis is carefully designed so that all the quantities are defined conditional on the covariates so that the unknown cross-sectional dependence structure of covariates does not affect the asymptotic validity of inference.
This paper provides results from a small scale Monte Carlo simulation study. The study investigates the finite sample performance of asymptotic confidence intervals using networks generated according to the preferential attachment random graph generation model of Barabási and Albert. (See Jackson (2008), Section 5.2.) The results show a reasonably stable behavior of finite sample coverage probabilities. The simulation studies also show that the more aligned the cross-sectional dependence structure of covariates is to the contact network over which diffusion arises, the larger the relational diffusion becomes.
The literature of epidemiology, sociology and economics studied diffusion of various phenomena such as disease, information, technology. (See Chapter 17 of Newman (2010) for a review of the models and the literature.) Recent contributions in economics that study diffusion over a network include Akbarpour, Malladi, and Saberi (2020), Banerjee, Chandrasekhar, Duflo, and Jackson (2019), Beaman, BenYishay, Magruder, and Mobarak (2020), and Sadler (2020). This paper’s study of diffusion as a causal parameter is closely related to the recent literature on causal inference with network interference. See Aronow and Samii (2015), van der Laan (2014), and Leung (2020). (We refer the readers to He and Song (2022) for a more extensive literature review in this area.) This paper’s causal inference framework basically follows He and Song (2022), but departs from the paper by developing a formal way of quantifying the role of covariates in causing the spuriousness of diffusion. This requires a substantial modification of their procedure.
The paper is organized as follows. The next section explains the causal framework for analysis of diffusion, and introduces a spatio-temporal dependence measure for each set of covariates, and provides a decomposition of the measure into a component due to the covariates and a residual. The section then concludes by establishing identification of diffusion and explaining the role of cross-sectional dependence of covariates in creating relational diffusion. Section 3 focuses on inference on diffusion decomposition. The section offers asymptotic inference on the component that is due to the covariates and provides conditions for its asymptotic validity. Section 4 presents and discusses results from a Monte Carlo simulation study. Section 5 concludes. Mathematical proofs are collected in the appendix.
2. Diffusion Over a Network and Identification of Causal Effects
2.1. Diffusion Over a Contact Network
Let us consider a generic model of diffusion of binary actions over a large network of people as follows. There are two states, 0, and 1, and everybody starts with the default state of 0. For example, the diffusion may be about that of a certain farming technology over a network of farmers, where the state of 0 represents the non-adoption of the technology, and 1 represents its adoption. Each person’s binary action at time records a switch of the state at time from state 0 to state 1. We assume that the switched state is irreversible in the sense that the switch of the state can happen only once. Hence if at some time , we have for all . This is the case, especially when the switch of the state is defined to be the switch of a state for the first time. The binary actions spread over a network over time.
To formalize this process, suppose that there is a directed network called a contact network over a set of people, where each neighborhood of a person represents the set of people whose influence the person is directly exposed to. More specifically, we denote the contact network by (with subscript “” mnemonic for “contact”). The edge set consists of edges , where the presence of an edge in means that person is exposed to the direct influence from .444The notion of a directed edge from to is taken from the notation in Newman (2010) where the graph is represented by an adjacency matrix and its -th entry is 1 if and only if there is an edge from to . We denote the in-neighborhood of person by555The in-neighborhood in a directed graph refers to a set of neighbors whose edges with the person are from the neighbors to the person .
[TABLE]
which represents the set of people whose influence person is directly exposed to. The contact network describes whose actions in one period potentially affect whose actions in the next period. Thus, each person ’s binary action is a function of for a set of neighbors in , and her own state vector : for ,
[TABLE]
for some map . Here represents the switch of the state of person (i.e., an “action” by person ) at time . Suppose that person has switched the state at some time , so that . Since this switched state is irreversible, no further switch of the state is allowed for this person after time . Hence this person should have .
The diffusion process in (2.4) is a generalized version of a threshold model of diffusion studied in the literature. (See Granovetter (1978). See also a recent contribution by Acemoglu, Ozdaglar, and Yildiz (2011) for an example.) A special case of this model is a linear threshold model where the map is given by
[TABLE]
where is a weight that individual gives to .
2.2. The Researcher’s Observation
The researcher observes each person’s state at time , which is denoted by , and her state at time , which is denoted by . These observed binary states are related to the state-switches as follows:
[TABLE]
Recall that . By the irreversibility of state-switches, each person can switch the state at most once, which implies that . Hence if and only if person is in state 1 at time . When , this means that the person has never switched the state including the initial period. The researcher does not observe the diffusion process in real time. The researcher observes the states of people at two time periods and .
Our setting accommodates information diffusion where represents the indicator of a person who receives information first in the network, and , , the indicator of certain binary action (such as purchase of a good) by person in time . However, we require that both and as defined in (2.6) are observed at some time for each person . We exclude the situation where there is information diffusion and we do not know who the initial receivers of the information are.
Let us introduce a graph that represents the causal connections between observed actions ’s and ’s. We can trace the actions at a given time back to the initial actions at time [math]. To see this, first let denote the set of people such that each is connected to along the contact network , i.e., there exist such that and
[TABLE]
The sets , , define a network, say, , where if and only if . The set represents all the people whose initial actions potentially have influenced person ’s decision at time indirectly through the influences of neighbors in the contact network.666Suppose that is the adjacency matrix of the contact network such that its -th entry is given by , for each , i.e., there is an edge from to in if and only if the -th entry of is one. Then for , we have if and only if
Recall that the -th entry of , denoted by here, counts the number of the walks of length from to . Hence for , we have if and only if there is a walk of length less than or equal to from to in the contact network. We call the graph the causal graph for , where and .
The researcher, however, does not observe the causal graph (or the contact network). Instead, she observes a graph . With regards to the relation between and , we make the following assumption:
Assumption 2.1**.**
The observed graph contains as a subgraph.
This assumption does not require that the observed graph “approximates” the causal graph in any sense. Neither is it required to contain the directional information in the causal relations in and . In fact, the assumption is satisfied if contains an undirected supergraph of as a subgraph. This is convenient, because the observed graph may not capture the direction of causality accurately in practice. The essence of Assumption 2.1 is that it requires the observed graph to capture the cross-sectional dependence among ’s. This assumption is substantially weaker than the assumption for the networks, for example, used in linear-in-means models (e.g., Manski (1993) and Bramoullé, Djebbari, and Fortin (2009)). In these models, it is not enough to assume that the observed network contains the true network as a subgraph. (See de Paula, Rasul, and Souza (2020) and Lewbel, Qu, and Tang (2021) for approaches that do not require network data at all.)777It is possible to relax this assumption into He and Song (2022) called Dependency Causal Graphs. However, we do not pursue this more general framework here.
Let be the -field generated by and the adjacency matrices of and , where is the collection of covariate vectors, . Throughout the paper, we assume that the covariates, and the graphs and are stochastic. We also allow the graphs and to be a large connected graph, where every pair of people is connected directly or indirectly. However, for asymptotic inference, we require the graphs to be not too dense. We make this assumption precise later.
2.3. Average Diffusion at the Margin (ADM)
We introduce the causal parameter of interest following the potential outcome approach in program evaluations. By recursively applying the equation in (2.4), we can rewrite as a function of and unobserved heterogeneities:
[TABLE]
where the vector consists of components with and , and is determined as the compositions of maps with and .
For , , and , we introduce a potential outcome which is the same as except that in on the right hand side of (2.10) is replaced by . (If , then is simply taken to be .) This is the state of person in period , when the initial action of person is counterfactually fixed to be . Our focus is on the Average Diffusion at the Margin (ADM) at :
[TABLE]
The ADM was introduced by He and Song (2022). It measures the expected increase in the number of switchers when one additional randomly chosen individual switches her state in the initial period, while other people choose their initial actions according to the randomness of the event . The impact of a randomly selected person changing the initial action from 0 to 1 on the number of the total switchers until time is measured after integrating out the conditional distribution of other people’s initial actions , , given . Hence, the “average” in the ADM is two-fold. The first “average” refers to the expectation over the conditional distribution of , , given , and then the second average is over the random selection of .
Suppose that there is no diffusion in the sense that the map in (2.4) does not depend on , that is,
[TABLE]
Then, , for all and . Hence in this case, .
When the conditional probability of an initial switch, , is very small, this does not necessarily make the ADM small, because the ADM compares the expected number of switches between two counterfactual scenarios (one with a randomly chosen being an initial switcher and the other not), and the two scenarios use the same conditional distribution given . However, if is very small, it may affect the quality of the asymptotic inference that we introduce later.
2.4. Identification of the ADM
As mentioned before, we assume that the researcher observes the initial actions, each person’s states by time , covariates and observed graph . That is, the researcher observes for each .
For the initial actions and unobserved heterogeneity , we make the following assumption that describes the conditional cross-sectional independence given .
Assumption 2.2**.**
’s are conditionally independent across ’s given .
The assumption requires that the cross-sectional dependence among ’s comes solely from the cross-sectional dependence of ’s or characteristics of networks and . For example, this condition is satisfied if, at each period, the action is determined by the neighbor’s actions in the previous period, and idiosyncratic unobserved heterogeneities that are cross-sectionally independent once one condition on the whole covariate vector and the graphs and . The covariate can include network characteristics of such as average degrees of agent or of her neighbors. It can also include an average of the characteristics of the neighbors.
We also assume an analogue of an unconfoundedness condition in program evaluations as follows.
Assumption 2.3**.**
For all , , is conditionally independent of given .
This condition is satisfied, for example, if the initial actions are determined solely by and some other random events that are independent of all other components. See He and Song (2022) for a detailed discussion on this assumption.
For the purpose of the decomposition analysis in this paper, we generalize the notion of the above unconfoundedness condition to accommodate the situation where one omits some covariates. First, let us introduce notation for subvectors of covariates. Recall that for each , and let be the -th entry of . Let . For each , let , , , and . Let us introduce the following notion of unconfoundedness condition.
Definition 2.1**.**
For each , we say that -unconfoundedness holds, if for all , , is conditionally independent of given .
The -unconfoundedness condition is stronger than the unconfoundedness condition in Assumption 2.3. In fact, the -unconfoundedness satisfies a monotonicity property: if , the -unconfoundedness implies the -unconfoundedness.888This monotonicity does not hold if one considers instead an alternative, weaker notion of -unconfoundedness: for all , , is conditionally independent of given . The failure of non-monotonicity in sets follows from the results in Phillips (1988). Hence the larger the set is, the -unconfoundedness condition becomes stronger. Especially the -unconfoundedness corresponds to the unconfoundedness in Assumption 2.3 used by He and Song (2022). At the other extreme, the -unconfoundedness corresponds to the randomized control trial where the covariates are entirely irrelevant in the treatment assignment (i.e., the variables here).
The rest of the section is devoted to presenting the result that the ADM is identified using only , and , under the -unconfoundedness condition. In other words, if the -unconfoundedness condition holds, then one can identify the ADM with covariates omitted. Hence omitting does not cause any relational diffusion. Later we develop a decomposition method to quantify the magnitude of relational diffusion, which can be used to test whether the -unconfoundedness holds or not.
To facilitate the identification analysis, let us make the following assumption on the initial actions , and covariates.
Assumption 2.4**.**
(i) There exist a known distribution function and unknown parameter such that for all ,
[TABLE]
is non-constant and has density bounded away from zero.
(ii) For all , the support of is not contained in any proper linear subspace of , and for any proper subset , there exists such that for all ,
[TABLE]
where for a symmetric matrix denotes the minimum eigenvalue of .
(iii) There exists such that for all .
Assumption 2.4(i) requires that the initial action be conditionally independent of given for each . As in the literature of program evaluations (e.g., Imbens and Wooldridge (2009)), one can view as the parametrized propensity score of person , i.e., the propensity of the person to switch the state at time [math]. As we explained before, this assumption is not as strong as it appears in our context because one can include other people’s covariates as part of , such as the average of the characteristics of the neighbors in the observed graph . While it is possible to extend our framework to other forms of parametric or semiparametric specifications, the specification (2.15) is most commonly used in practice, and simplifies the proposal of this paper in terms of both exposition and implementation. Assumption 2.4(ii) is typical in the literature of index models, often invoked for identification of . (See, e.g. Theorem 2.1 of Horowitz (2009).) Assumption 2.4(iii) is analogous to the overlap condition used in the literature of program evaluations, which requires that the probability of the initial switch of actions is bounded away from zero and one. (See, e.g., Imbens and Wooldridge (2009).) This condition can be violated, when has unbounded support or the dimension of is large. In our context, the assumption is not plausible especially when the diffusion starts with only a very small number of “seed people”. The analysis in Khan and Tamer (2010) who focused on i.i.d. observations can potentially be extended to this case. However, a full development in this direction is outside of the scope of this paper.
He and Song (2022) introduced a spatio-temporal dependence measure of as follows:
[TABLE]
and showed that
[TABLE]
under Assumptions 2.1, 2.2, and 2.3. They developed asymptotic inference for under the parametric propensity score assumption in Assumption 2.4(i). In this paper, we analyze the consequence of omitting covariates in the construction of and develop ways to measure the impact of the omission.
Let us introduce an analogue of with omitted. For each , we let
[TABLE]
where
[TABLE]
and
[TABLE]
It is not hard to see that under the -unconfoundedness condition, where is equal to except that the entries with indices in are eliminated. We have defined using instead of so that it is well defined regardless of whether the -unconfoundedness condition holds or not. Then, the analogue of the measure with covariates omitted can be written as follows:
[TABLE]
The quantity captures the covariation between and the “residuals” from projecting the local weighted average of the period [math] actions over in-neighbors on the covariates . The quantity is different from , due to the subtraction by in the conditional covariance. When , i.e., no covariate is omitted from the vector , is reduced to .
The theorem below shows that under the -unconfoundedness condition, ADM is identified as for all .
Theorem 2.1**.**
Suppose that Assumptions 2.2-2.4 hold, and the -unconfoundedness is satisfied for some . Then, for all ,
[TABLE]
Suppose that there is no diffusion (i.e., (2.14)) and the -unconfoundedness is satisfied for some . Then, for all . In practice, the -unconfoundedness condition can be too strong. If the condition fails, the equation (2.18) is not guaranteed to hold. In other words, the estimated ADM with omitted can be away from zero significantly, even when the true ADM is zero. The omission of creates relational diffusion in this case. By measuring the discrepancy between the ADM and , one can check whether omitting the covariates causes relational diffusion, and quantify its magnitude. In the next subsection, we elaborate this idea.
2.5. Relational Diffusion and Decomposition Analysis
Suppose that the researcher omits from the covariate vectors, and identifies the ADM by . When the -unconfoundedness fails, the omitted covariates may create what seems like a diffusion phenomenon even when there is no diffusion in reality. To see this more explicitly, let us define
[TABLE]
Then, we can write
[TABLE]
with
[TABLE]
and . Hence constitutes the remainder term in the decomposition as follows:
[TABLE]
By Theorem 2.1, we have under the -unconfoundedness condition. However, when the -unconfoundedness condition fails, the estimated version of can be non-zero, even when ADM is zero. This relational diffusion can be measured by .
Omitted variable bias in a linear regression model arises when omitted variables are correlated with other regressors. This correlation is the correlation within the same sample unit. In contrast, relational diffusion arises from the cross-sectional dependence of covariates. (This is illustrated in Figure 1.) Our Monte Carlo simulation results show that the cross-sectional dependence of covariates can play a significant role in determining the magnitude of relational diffusion.
3. Inference on the Relational Diffusion
3.1. Estimation of
Let us consider estimating the magnitude of the relational diffusion, . We use a sample analogue of as an estimator. First, define
[TABLE]
where is estimated using MLE, i.e.,
[TABLE]
and
[TABLE]
Similarly we obtain after removing from the index in the above maximization.
Then, we construct an estimator of as follows:
[TABLE]
where
[TABLE]
3.2. Asymptotic Inference on
We first establish the asymptotic linear representation of . Let us introduce some notation to simplify the expression of the representation. Let
[TABLE]
where denotes the first order derivative of and is the density of that appears in Assumption 2.4. Define
[TABLE]
where , and
[TABLE]
with , ,
[TABLE]
(Note that is analogous to the hessian matrix in the misspecified MLE.) Later we show that under regularity conditions,
[TABLE]
where
[TABLE]
and (with defined in (2.20))
[TABLE]
with ,
[TABLE]
However, consistent estimation of is not feasible in our context. To see this, we rewrite
[TABLE]
In order to estimate this quantity consistently, we should be able to consistently estimate . However, this latter term involves and is heterogeneous across ’s. Furthermore, we cannot simply model this as a parametric function given , because involved in potentially depends on in a complex form due to the latent contact network in the diffusion process. Instead, we adopt a conservative inference procedure by using the linear projection of the sample version of onto the range space of in the Euclidean space . (See (3.9) and (3.10) below.)
First we define a sample analogue of . Let
[TABLE]
and define
[TABLE]
where is the density of that appears in Assumption 2.4, and , and
[TABLE]
with , ,
[TABLE]
Then the sample analogue of is given by
[TABLE]
with
[TABLE]
To construct a confidence interval, we take the square of the standard error to be
[TABLE]
where with ,
[TABLE]
Then, the -level confidence interval for is given by
[TABLE]
where is the percentile of .
3.3. Asymptotic Theory
For the asymptotic validity of the confidence interval , we use the following set of assumptions.
Assumption 3.1** (Nondegeneracy).**
There exists a small such that the following is satisfied for all and all ,
[TABLE]
Assumption 3.1 requires the nondegeneracy of the distribution of the test statistics. This condition requires that the randomness of (conditional on ) does not disappear as . Since it is unlikely in practice that the finite sample conditional distribution (given ) of
[TABLE]
is degenerate, it appears to be reasonable to use Assumption 3.1 in deriving its asymptotic approximation.
We require conditions for the observed graph as follows.
Assumption 3.2**.**
There exists such that
[TABLE]
as .
Assumption 3.2 requires that the observed network is not too dense. See He and Song (2022) for conditions for a generic network formation model such that Assumption 3.2 is satisfied.
The next set of conditions are regularity conditions used to deal with the estimation error of the MLE and the quasi-MLE . Define
[TABLE]
The quantities and are the “hessians” of population MLE and quasi-MLE objective functions conditional on . The following conditions are similar to conditions used in the literature of MLE or MLE under misspecification.
Assumption 3.3**.**
For each , the following conditions are satisfied.
(i) The parameter space for is compact, and and lie in the interior of .
(ii) There exists such that for all ,
[TABLE]
with probability one.
(iii) The density of is log-concave.
(iv) is three times continuously differentiable with bounded derivatives, and for any compact set , there exists a constant that depends only on such that
[TABLE]
The assumption below puts a condition on the covariate vector . As our object of interest is defined in terms of conditional probability given , we do not require any condition on the cross-sectional dependence structure of ’s.
Assumption 3.4**.**
There exists constant such that for all ,
[TABLE]
The bounded support condition on has been used in the literature. (See, e.g. Hirano, Imbens, and Ridder (2003).) The following theorem establishes that the confidence interval defined in (3.11) is asymptotically valid.
Theorem 3.1**.**
Suppose that Assumptions 2.2 - 3.4 hold. Then for each ,
[TABLE]
The central part of the asymptotic validity result in Theorem 3.1 comes from the asymptotic normality result in (3.6). To see how this asymptotic normality arises, first, note that we have the following asymptotic linear representation (see Theorem 6.1 in the appendix):
[TABLE]
where is as defined in (3.7). Let us define the graph and . Under Assumptions 2.1 and 2.2, the quantities can be shown to have graph as a conditional dependency graph given .999A triangular array , , is said to have a graph as a conditional dependency graph given , if for any two subsets and of such that no two nodes and are adjacent in , and are conditionally independent given . Then we can apply the central limit theorem to the right hand side of (3.15), as long as the observed graph is not too dense, using Theorem 2.4 of Penrose (2003) or Corollary 3.1 of Lee and Song (2019). The required condition for the observed graph is fulfilled by Assumption 3.2.
3.4. A Step Down Procedure for Detecting the Sources of Relational Diffusion
One might be interested in detecting which set of covariates cause relational diffusion. In this section, we develop a multiple testing procedure that detects the set of such covariates with asymptotic control of Familywise Error Rate (FWER). First, let us introduce an individual hypothesis for each covariate index :
[TABLE]
Define
[TABLE]
where the subscript is placed as a reminder that this quantity depends on the conditional distribution of given . Then, we would like to find a data-dependent random set such that
[TABLE]
We declare the set to be the set of covariates which causes relational diffusion, i.e., . The probability on the left hand side of (3.18) is the FWER, which is the probability that there is at least one covariate , with , which is falsely declared to be causing a relational diffusion.
Let us consider the following step-down procedure inspired by Romano and Shaikh (2010). For each subset , let , , be i.i.d. random vectors in , drawn from , and let be an matrix whose entries are given by
[TABLE]
and and are entries of and corresponding to the covariate index . Then, we construct to be the percentile of , where
[TABLE]
and denotes the -entry of the vector whose elements are equal to the absolute value of the elements of .
Setting , we recursively define
[TABLE]
and we stop when , and take .
Let us present our result that shows asymptotic control of FWER. Let be the -dimensional vector whose entries are given by , . Define
[TABLE]
We introduce a condition under which the conditional distribution of given is not degenerate uniformly over and over .
Assumption 3.5**.**
There exists such that for all .
The following theorem shows that this set controls the FWER asymptotically.
Theorem 3.2**.**
Suppose that Assumptions 2.2 - 3.5 hold. Then,
[TABLE]
4. Monte Carlo Simulations
4.1. Data Generating Process
Let us first explain the data generating process we use for our Monte Carlo simulation study. First, we generate the contact network . For this, we choose the adjacency matrix of the contact network as a block diagonal matrix and each block matrix is generated by the Barabási-Albert model which starts with the 20 households per village with an Erdös-Rényi random graph. We treat each block matrix as a village and each node as a household. In total, we have villages and each village has or households. Thus, the total number of the households is either or .
We generate the observed graph as follows. The adjacency matrix of the observed graph is set to be a block diagonal matrix constructed as follows. For each block matrix in the adjacency matrix of the contact network , we form each block matrix by taking each of its entries to be 1 if and only if the corresponding entry of the matrix is nonzero. The graph statistics are presented in Table 1. We fix the realized contact network and the observed graphs, and generate outcomes using the same networks across Monte Carlo simulations. As we are not considering the randomness of the networks in our simulation study, what matters for our purpose is the shape of the realized networks in finite samples, rather than the stochastic property of the random graph models that are used to obtain the realizations.
We generate the binary actions as follows. For each , we specify
[TABLE]
where ’s are i.i.d. and follow the uniform distribution on , and is the distribution function of . We set and .
The covariates ’s constitute an matrix , where is a vector of ones and is an matrix which is generated as
[TABLE]
where is also an matrix with i.i.d. entries from the uniform distribution on , is an identity matrix, is the adjacency matrix of the contact network , and is that of an independently generated Erdös–Rényi graph with the same scale and average degree as ’s.101010We calculate the average degree of by firstly adding up the in-degrees and out-degrees of each node, and then taking average of the total degrees and dividing by two. In this way, the two adjacent matrices will be approximately equally dense so that varying will not affect the dispersion of ’s or the extensiveness of the cross-sectional dependence substantially. The scalar captures to what extent the cross-sectional dependence of ’s is aligned with the contact network . As gets closer to 1, the cross-sectional dependence structure of ’s is more aligned with the contact network . When , and are correlated if and only if and are adjacent in . When , the cross-sectional dependence of ’s is determined by an independently generated Erdös–Rényi graph. We choose and see how the choice affects relational diffusion.
In the simulation, we consider a variant of the linear threshold diffusion model in (2.5) as follows: for
[TABLE]
where ’s are i.i.d. and have the distribution function of , the covariates are the same for the same cross-sectional unit across the short period, and
[TABLE]
where is the in-neighborhood of in the contact network we have generated before. In addition, we choose and set . For the simulations, we have set , and the Monte Carlo simulation number to be 10,000.
As for the specification of the diffusion model, it is important to note that while we assume that the researcher knows the specification for (4.1), she does not know that ’s are generated as in (4.4). In other words, we allow her to be entirely agnostic about the specification of , except that it is generated from a generalized diffusion model of the form in (2.4), and hence the estimation and inference on relational diffusion proposed in this paper does not rely on any information of this particular specification in (4.4).
As for the omitted covariates, we considered , so that we omitted the last entry of the -dimensional covariate vector . The true values of ADM and are presented in Table 2. We computed the true values by simulations using 100,000 simulation draws. Recall that when , the cross-sectional dependence of covariates is shaped by the contact network, and when , it is entirely unrelated to the contact network. When , there is no diffusion. However, we see that is not zero, exhibiting relational diffusion. The relational diffusion is larger when than when . This confirms that the magnitude of relational diffusion is related to how similar the cross-sectional dependence structure of the covariates is to the contact network.
4.2. Estimation and Results
For the Monte Carlo simulations, we have estimated as in (3.3). Recall that is chosen to be the distribution function of . The results of the finite sample coverage probabilities for the confidence intervals are shown in Table 3. When we use 50 households per village, the coverage probability exhibits slight under coverage. However, this coverage probability improves when the number of households is increased to 200 households.
Interestingly, the effect of increase in the number of the households per village depends on , i.e., whether the cross-sectional dependence structure of the covariates is similar to the contact network or not. When it is similar to the contact network (), the increase in the number of the households increases the length of the confidence intervals. On the other hand, when it is very different from the contact network (), the increase leads to a shorter length of the confidence intervals. Thus, it appears that when the cross-sectional dependence structure of the covariates is aligned with the contact network, the increase in the sample size seems to magnify the standard error in the confidence interval.
Part of this effect should also be coming from the increased neighborhood sizes as the number of households increases. For example, note that as we increase the number of households from 50 to 200, the maximum degree and the average degree of the causal graph increase from 27 and 95 to 2.9373 and 4.3488, respectively. Hence as the number of households grows, the cross-sectional dependence also becomes more extensive.
5. Conclusion
In this paper, we develop a method of quantifying the role of the covariates contributing to relational diffusion. This paper’s proposal can be useful in practice especially when there is a concern about potential bias in the estimated diffusion due to missing covariates. In this situation, one may want to quantify the role of covariates in the estimated diffusion and see whether the role is statistically significant. This paper provides a statistical method that is potentially useful in such a situation.
There are multiple extensions of the paper’s proposal. First, it would be interesting to consider a situation with multiple networks and to measure relational diffusion along each network. Note that in the context of linear spatial models, Drukker, Egger, and Prucha (2022) studied situations with multiple networks and provided asymptotic inference. Second, it could be interesting to investigate whether there exists inference based on permutation on the diffusion decomposition. Conditional on and , observations are all heterogeneously distributed. Hence standard nonparametric bootstrap does not work. (See Kojevnikov (2021) for a bootstrap method for network dependence processes.) However, there could be a permutation-based approach that exhibits better finite sample performance than asymptotic inference. This was shown in the Monte Carlo study of Song (2018) in estimating the graph concordance. It would be interesting to see if such a phenomenon extends to this decomposition analysis studied in this paper.
6. Appendix: Mathematical Proofs
**Proof of Theorem 2.1: ** First, we show that . Since , we write
[TABLE]
Hence taking conditional expectations given , and using Assumption 2.3,
[TABLE]
where . By rearranging terms, we find that the left hand side is equal to
[TABLE]
proving that .
If the -unconfoundedness holds, we have
[TABLE]
(See, e.g., Lemma 4.2(ii) of Dawid (1979).) Since is not constant on the support of , and the support of is not contained in a proper linear subspace of by Assumption 2.4, we have , where is the vector consisting of entries in with indexes in . Hence . Since both and maximize over uniquely, we must have . Therefore, . This means that . Finally, the -unconfoundedness implies the -unconfoundedness for all , yielding the desired result.
The rest of the proofs are devoted to proving Theorems 3.1 and 3.2. Throughout the auxiliary results below, we assume that the conditions of Theorem 3.2 hold. (In fact, Assumption 3.5 is used only for the proof of Theorem 3.2.)
Lemma 6.1**.**
For each , , where is as defined in (3.7), and
[TABLE]
**Proof: ** The results follows because involves a sum over and this sum is bounded by for some constant that does not depend on .
The following lemma gives an asymptotic linear representation of the estimators and .
Lemma 6.2**.**
[TABLE]
Furthermore,
[TABLE]
**Proof: ** For both statements of (6.1) and (6.2), the proof can proceed in the same way as in the proof of Lemmas C.5 and C.6 of He and Song (2022).
We are prepared to present the asymptotic linear representation of .
Theorem 6.1**.**
[TABLE]
where is as defined in (3.7).
**Proof: ** First, let
[TABLE]
Also we define for ,
[TABLE]
Let us write
[TABLE]
where
[TABLE]
where .
First, let us analyze . We write this as , where
[TABLE]
The term in is due to (6.2), Assumption 3.2, and the assumption that by Assumption 2.4(iii). Using the first order Taylor expansion around , and using Lemma 6.2, we obtain that
[TABLE]
(Recall the definitions of and in (3.4) and (3.5).) Similarly, as for , we obtain that
[TABLE]
Again, using Lemma 6.2, we conclude that
[TABLE]
Hence we find that
[TABLE]
Let us turn to . We write this as , where
[TABLE]
Similarly as before, the term in is due to (6.2) and the assumption that by Assumption 2.4(iii). Using the same arguments as before, we find that
[TABLE]
(Recall that because .) Combining these results with (6.3), we obtain the desired result.
Let be the -dimensional vector whose entries are given by , .
Lemma 6.3**.**
[TABLE]
where is as defined in (3.21).
**Proof: ** By Theorem 6.1, we first write
[TABLE]
Take such that . Recall the definition of after Theorem 3.1. By Assumptions 2.1 and 2.2, has as a conditional dependency graph given , which is a special case of conditional neighborhood dependency introduced in Lee and Song (2019). Let and we apply their Corollary 3.1 and Assumption 3.1 to deduce that
[TABLE]
for some constant that does not depend on , where
[TABLE]
Thus, the desired result follows from this and Assumption 3.2 and the Cramér-Wold device.
Let denote the population version of which is defined as follows:
[TABLE]
Lemma 6.4**.**
[TABLE]
**Proof: ** Inspecting the terms in , we find that the estimation error of comes from the estimation errors of and . It is not hard to see from Lemma 6.2 that
[TABLE]
Furthermore,
[TABLE]
and
[TABLE]
Collecting these rate results, we find that
[TABLE]
Thus from Assumption 3.2, we obtain the first statement of the lemma.
The second statement immediately follows because
[TABLE]
(See the proof of Lemma B.13 of He and Song (2022).)
Define
[TABLE]
where and are the vectors having entries , and , , respectively, and similarly with . Let be the matrix whose -th entry for is given by
[TABLE]
where denotes the -th entry of , and . Let the -th diagonal entry of be denoted by .
Lemma 6.5**.**
.
**Proof: ** For , define
[TABLE]
where denotes the -th entry of . Then using Assumption 3.2 and Lemma 6.4, and following the same argument as in the proofs of Lemmas B.14 and B.15 of He and Song (2022), we find that
[TABLE]
Since is the -th element of , we obtain the desired result.
Lemma 6.6**.**
[TABLE]
where denotes the -field generated by , and
[TABLE]
**Proof: ** First, for each and , we define
[TABLE]
By the same arguments in the proof of Lemma B.9 of He and Song (2022), we can see that is positive semidefinite for all . As in (3.14) of Kojevnikov and Song (2022), we find that for any ,
[TABLE]
for some constant that does not depend on , where is the constant in Assumption 3.5. Hence by Assumption 3.5, the last probability in the above display vanishes as . This gives the desired result.
Define
[TABLE]
Lemma 6.7**.**
For any ,
[TABLE]
**Proof: ** Let be the percentile of the conditional distribution of
[TABLE]
given .
First, we show that for all ,
[TABLE]
To see this, note that by Assumption 3.2 and Lemma 6.5, for each ,
[TABLE]
Since the conditional density of given is bounded uniformly over , by Assumption 3.5, the term is uniform over . Hence
[TABLE]
Since is positive semidefinite, by Theorem 1 of Jensen (1984), we find that for each ,
[TABLE]
Take any . On the event that
[TABLE]
we have
[TABLE]
by (6.15). Hence
[TABLE]
by Lemma 6.6. Since the choice of was arbitrary, we obtain the desired result of (6.12). Hence
[TABLE]
**Proof of Theorem 3.1: ** Using the same arguments in the proof of Lemmas 6.3 and 6.5, we find that
[TABLE]
and
[TABLE]
The desired result follows from these two results.
**Proof of Theorem 3.2: ** Note that is increasing in . Hence, the desired result follows from Lemma 6.7 and Theorem 2.1 of Romano and Shaikh (2010).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Acemoglu, Ozdaglar, and Yildiz (2011) Acemoglu, D., A. Ozdaglar, and E. Yildiz (2011): “Diffusion of Innovations in Social Networks,” IEEE Conference on Decision and Control and European Control Conference, 12/2011 , pp. 2329–2334.
- 3Akbarpour, Malladi, and Saberi (2020) Akbarpour, M., S. Malladi, and A. Saberi (2020): “Just a Few Seeds More: Value of Network Information for Diffusion,” Working Paper .
- 4Aral, Muchnik, and Sundararajan (2009) Aral, S., L. Muchnik, and A. Sundararajan (2009): “Distinguishing Influence-Based Contagion from Homophily-Driven Diffusion in Dynamic Networks,” Proceedings of the National Academy of Sciences of the United States of America , 106(51), 21544–21549.
- 5Aronow and Samii (2015) Aronow, P., and C. Samii (2015): “Estimating Average Causal Effects under Interference between Units,” Working Paper .
- 6Banerjee, Chandrasekhar, Duflo, and Jackson (2013) Banerjee, A., A. G. Chandrasekhar, E. Duflo, and M. O. Jackson (2013): “The diffusion of microfinance,” Science , 341(6144), 1236498.
- 7Banerjee, Chandrasekhar, Duflo, and Jackson (2019) (2019): “Using Gossips to Spread Information: Theory and Evidence from Two Randomized Controlled Trials,” Review of Economic Studies , 86, 2453–2490.
- 8Beaman, Ben Yishay, Magruder, and Mobarak (2020) Beaman, L., A. Ben Yishay, J. Magruder, and A. M. Mobarak (2020): “Can Network Theory-based Targeting Increase Technology Adoption?,” Working Paper .
