Identification and Estimation of a Partially Linear Regression Model using Network Data
Eric Auerbach

TL;DR
This paper introduces a novel nonparametric method for estimating a partially linear regression model with network data, leveraging the squared adjacency matrix to identify the unknown function of a latent network driver.
Contribution
It proposes a new matching-based estimation approach that avoids specifying a parametric network formation model, using the squared adjacency matrix to capture all identifiable information.
Findings
Consistent estimators for the regression parameters are developed.
The method effectively captures latent network effects without parametric assumptions.
Application to network peer effects demonstrates practical utility.
Abstract
I study a regression model in which one covariate is an unknown function of a latent driver of link formation in a network. Rather than specify and fit a parametric network formation model, I introduce a new method based on matching pairs of agents with similar columns of the squared adjacency matrix, the ijth entry of which contains the number of other agents linked to both agents i and j. The intuition behind this approach is that for a large class of network formation models the columns of the squared adjacency matrix characterize all of the identifiable information about individual linking behavior. In this paper, I describe the model, formalize this intuition, and provide consistent estimators for the parameters of the regression model. Auerbach (2021) considers inference and an application to network peer effects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Social Capital and Networks
Identification and Estimation of a Partially Linear Regression Model using Network Data
Eric Auerbach111Department of Economics, Northwestern University. E-mail: [email protected]. I thank my advisors, James Powell and Bryan Graham for their advice and support. I also thank Jonathan Auerbach, Ivan Canay, David Card, Christina Chung, Aluma Dembo, Joel Horowitz, Michael Jansson, Patrick Kline, Sheisha Kulkarni, Chuck Manski, Konrad Menzel, Carl Nadler, Stephen Nei, Aureo de Paula, Demian Pouzo, Mikkel Soelvsten, Katalin Springel, Max Tabord-Meehan and participants at the UC Berkeley Econometrics Seminar for helpful feedback.
Abstract
I study a regression model in which one covariate is an unknown function of a latent driver of link formation in a network. Rather than specify and fit a parametric network formation model, I introduce a new method based on matching pairs of agents with similar columns of the squared adjacency matrix, the th entry of which contains the number of other agents linked to both agents and . The intuition behind this approach is that for a large class of network formation models the columns of the squared adjacency matrix characterize all of the identifiable information about individual linking behavior. In this paper, I describe the model, formalize this intuition, and provide consistent estimators for the parameters of the regression model. Auerbach (2021) considers inference and an application to network peer effects.
1 Introduction
Most economic outcomes are not determined in isolation. Rather agents are influenced by the behaviors and characteristics of other agents. For example, a high school student’s academic performance might depend on the attitudes and expectations of that student’s friends and family (see generally Sacerdote 2011, Bramoullé et al. 2019).
Incorporating this social influence into the right-hand side of an economic model may be desirable when the researcher wants to understand its impact on the outcome of interest or when it confounds the impact of another explanatory variable such as the causal effect of some nonrandomized treatment. For instance, the researcher may want to learn the causal effect of a tutoring program on academic performance in which program participation and counterfactual academic performance are both partially determined by family expectations. However, in many cases the relevant social influence is not observed by the researcher. That is, the researcher does not have access to data on the family expectations that confound the causal effect of the tutoring program and thus cannot control for this variable using conventional methods.
An increasingly popular solution to this problem is to collect network data and suppose that the unobserved social influence is revealed by linking behavior in the network. For instance, the researcher might observe pairs of students who identify as friends and believe that students with similar reported friendships have similar family expectations. It is not immediately clear, however, how one might actually use network data to account for this unobserved social influence in practice, since the number of ways in which agents can be linked in a network is typically large relative to the sample size.
This paper proposes a new way to incorporate network data into an econometric model. I specify a joint regression and network formation model, establish sufficient conditions for the parameters of the regression model to be identified, and provide consistent estimators. Large sample approximations for inference and an application to network peer effects building on work by Bramoullé et al. (2009), de Giorgi et al. (2010), Goldsmith-Pinkham and Imbens (2013), Hsieh and Lee (2014), Johnsson and Moon (2015), Arduini et al. (2015), and others is provided by Auerbach (2021).
A limitation of the framework is that the large sample approximations suppose a sequence of networks that is asymptotically dense in that the fraction of linked agent-pairs does not vanish with the sample size. The regime can fail to characterize potentially relevant features of networks in which relatively few agent pairs are linked (see Mele 2017, for a discussion). Potential extensions to sparse asymptotic regimes are left to future work.
2 Framework
2.1 Model
Let represent an arbitrary agent from a large population. Associated with agent are an outcome , an observed vector of explanatory variables , and an unobserved index of social characteristics . The three are related by the model
[TABLE]
where is an unknown slope parameter, is an unknown measurable function, and is an idiosyncratic error.
The researcher draws a sample of agents uniformly at random from the population. This sample is described by the sequence of independent and identically distributed random variables , although only is observed as data. The researcher also observes , an stochastic binary adjacency matrix corresponding to an unlabeled, unweighted, and undirected random network between the agents. The existence of a link between agents and is determined by the model
[TABLE]
in which is a symmetric measurable function satisfying the continuity condition that for every and is a symmetric matrix of unobserved scalar disturbances with independent upper diagonal entries that are mutually independent of . This continuity condition is weaker than the typical assumption that is a continuous function. It is used because it allows for a variety of models where is “almost” but not quite a continuous function. For example, in the blockmodel described in Section 2.2.1 below, is a piecewise continuous function.
The regression model (1) represents a pared-down version of various linear models popular in the network economics literature. For instance in the network peer effects literature, could be student ’s GPA, could indicate whether participates in a tutoring program, could index student ’s participation in various social cliques, and could represent the influence of student ’s peers’ expected GPA, program participation, or other characteristics on student ’s GPA. That is, supposing ,
[TABLE]
for some (see relatedly Manski 1993). To demonstrate the proposed methodology, this paper conflates these different possible social effects into one social influence term, . This may be sufficient to identify the impact of the tutoring program holding social influence constant, predict a student’s GPA under some counterfactual social influence, or test for the existence of any social influence. Auerbach (2021) discusses how one can also separately identify different social effects.
The parameters of interest are and , the realized social influence for agent . The function is not a parameter of interest because it is not separately identified from (see Section 2.2.1 below). It is without loss to normalize the distribution of to be standard uniform.
Network formation (2) is represented by conditionally independent Bernoulli trials. The model is a nonparametric version of a class of dyadic regression models popular in the network formation literature. Section 6 of Graham (2019) or Section 3 of de Paula (2020) contains many examples. It is often given a discrete choice interpretation in which represents the marginal transferable utility agents and receive from forming a link, which precludes strategic interactions between agents. The distribution of is not separately identified from and so is also normalized to be standard uniform.
Under (2), the observed network is almost surely dense or empty in the limit. That is, for a fixed and as tends to infinity, will either be bounded away from zero or exactly zero with probability approaching one. The framework can potentially accommodate network sparsity by allowing to vary with the sample size (see Appendix Section A.1), but a formal study of such an asymptotic regime is left to future work.
The following Assumption 1 collects key aspects of the model for reference.
Assumption 1: The random sequence is independent and identically distributed with entries mutually independent of , a symmetric random matrix with independent and identically distributed entries above the diagonal. The outcomes and are given by equations (1) and (2) respectively. The variables and have finite eighth moments, and have standard uniform marginals, , for every , , , and .
2.2 Identification
2.2.1 Non-identification of the social characteristics
If were observed, (1) would correspond to the partially linear regression of Robinson (1988) and the identification problem would be well-understood. If were unobserved but identified, one might replace with an empirical analogue as in Ahn and Powell (1993). Identification strategies along these lines are considered by Arduini et al. (2015) and Johnsson and Moon (2015).
However, it is not generally possible to learn in the setting of this paper. The main difficulty is that many assignments of agents to social characteristics generate the same distribution of network links. Specifically, for any measure-preserving invertible (that is for any measurable , and have the same measure), and generate the same distribution of links, where and may be very different. For example, if and explain the distribution of , then so too does and where .
Furthermore, even if the researcher is willing to posit a specific , the social characteristics may still not be identified. For example, in a simplified version of the blockmodel of Holland et al. (1983), there exists an dimensional matrix such that Intuitively, is divided into partitions (with agent assigned to partition ) and the probability two agents link only depends on their partition assignments. In this case, the probability that agents link is invariant to changes in the social characteristics that do not change the agents’ partition assignments, and so while the underlying partition assignments might be learned from , the social characteristics that determine the partition assignments generally cannot. Notice that in this case is not a continuous function, but satisfies the continuity condition of Assumption 1.
Another example in which the social characteristics are not identified is the homophily model . Intuitively, agents are more likely to form a link if their social characteristics are similar. In this case, both and generate the same distribution of links.
An example in which the social characteristics are identified is the nonlinear additive model where is a strictly monotonic function such as the logistic function (see Graham 2019, Section 6.3). Intuitively, agents with larger values of are more likely to form links. In this case, is identified from because where and is an independent copy of .
2.2.2 Agent link function
Since is not generally identified, I propose an alternative description about how is linked in the network that is identified. I call this alternative an agent link function and propose using link functions instead of social characteristics to identify and .
Agent ’s link function is the projection of onto . That is, . It is the collection of probabilities that agent links to agents with each social characteristic in . I consider link functions to be elements of , the usual inner product space of square integrable functions on the interval. I sometimes use to refer to the pseudometric on induced by -differences in link functions. I call this pseudometric network distance.
Conditional expectations with respect to implicitly refer to the random variable . For example, and
where and are independent copies of . The conditional expectations on the right-hand side are well-defined for any because of the continuity condition on in Assumption 1. Whenever the conditional expectation on the left-hand side is used, the relevant limit is assumed to exist.
Under (2), the link function is the totality of information that contains about . It describes the law of the th row of and so is identified. Furthermore, is only identified when is invertible in . For example, in the nonlinear additive model from Section 2.2.1, is identified because implies that dominates . In this example, agents with different social characteristics necessarily have different probabilities of forming links to other agents in the population. In the blockmodel, is not identified because if then even if . In this example, agents with different social characteristics but the same partition assignment have the same probability of forming links to other agents in the population.
The large-sample limits of many popular agent-level network statistics are determined by the agent’s link function. Examples include degree , average peers’ characteristics , and clustering (supposing ). This observation will partly inform Assumption 3 below.
2.2.3 Identification of the regression model
If were observed or identified, the standard approach would be to first identify using covariation between and unrelated to and then to identify using residual variation in . This identification strategy requires variation in not explained by . Let .
Assumption 2: where is the smallest eigenvalue.
Assumption 2 is strong but standard. It is violated when the covariates include population analogues of network statistics such as agent degree or average peers’ characteristics (or any other function of ). In such cases, alternative assumptions are required for identification.
Since is neither observed nor identified, the standard approach cannot be implemented. A contribution of this paper is to propose using instead of for identification. The substitution relies on an additional assumption that is determined by .
Assumption 3: For every there exists a such that .
Assumption 3 is strong and new. In words, it says that agents with similar link functions have similar social influence. Since, under (2), is the totality of information that contains about , Assumption 3 supposes that this information is sufficient to discern . It does not restrict the function .
One justification for the assumption could be that does not directly impact . Instead, only influences by altering ’s linking behavior . For example, if indexes student ’s participation in various social cliques, then the assumption follows if this index only directly affects which other students and teachers interacts with, and it is these interactions that ultimately determine ’s participation in the tutoring program and GPA.
The assumption is also satisfied when the social influence is the population analogue of one of the network statistics described in Section 2.2.2. This is the case for the network peer effects example of Section 2.1 where , because is a continuous functional of .
However, the assumption may be implausible when the network is sparse (the link function is close to [math]) because every agent-pair may have network distance close to zero. As a result, under network sparsity, Assumption 3 may imply that behaves like a constant. See Appendix Section A.1 for a discussion.
Proposition 1 states that Assumptions 1-3 are sufficient for and to be identified.
Proposition 1: Under Assumptions 1-3,
- (i)
and
- (ii)
.
I close with two examples in which and are identified (Assumptions 1-3 hold) but is not. The first example is the case where links are determined by a blockmodel and social influence is determined by the agent partition assignments . In this case, and are identified even though is not. The second example is the case where links are determined by a homophily model and social influence is an affine function of the agent social characteristics . In this case and are identified even though and are not separately identified.
2.3 Estimation
Estimation of and is complicated by the fact that is unobserved and difficult to approximate directly. A contribution of this paper is to demonstrate that estimation is still possible using columns of the squared adjacency matrix. To explain the procedure, I introduce the codegree function.
2.3.1 Agent codegree function
Let map to the conditional probability that and have a link in common, i.e. . Agent ’s codegree function is the projection of onto . That is, . Codegree functions are also taken to be elements of . I sometimes use to refer to the pseudometric on induced by -differences in codegree functions,
[TABLE]
I call this pseudometric codegree distance. Conditional expectations with respect to codegree functions are defined exactly as they are for link functions.
In contrast to link functions, the population analogues of most network statistics (including those in Section 2.2.2) cannot naturally be written as functionals of codegree functions. The use of codegree functions is instead motivated by Lemma 1 below.
2.3.2 Estimators
I propose using codegree functions instead of link functions to construct estimators for and . The proposal relies on two results. The first result is that agents with similar codegree functions have similar link functions. The second result is that codegree distance can be consistently estimated using the columns of the squared adjacency matrix.
The first result is given by Lemma 1 and is related to arguments from the link prediction literature (see in particular Lovász and Szegedy 2010, Rohe et al. 2011, Zhang et al. 2015).
Lemma 1: If then for every
[TABLE]
If also for every then for every and there exists a such that
[TABLE]
I defer a discussion of Lemma 1 to Section 2.3.3 and emphasize here instead its implication that the parameters of interest can be expressed as functionals of the agent codegree functions. That is, under Assumptions 1-3, uniquely minimizes
over and .
The second result is that can be consistently estimated by the root average squared difference in the th and th columns of the squared adjacency matrix,
[TABLE]
Intuitively, the empirical codegree counts the fraction of agents that are linked to both agents and , is the collection of empirical codegrees between agent and the other agents in the sample, and gives the root average squared difference in ’s and ’s collection of empirical codegrees. That converges uniformly to over the distinct pairs of agents as is shown in Appendix Section A.4 as Lemma B1.
A consequence of these two results and Assumptions 1-3 is that when the th and th columns of the squared adjacency matrix are similar then and are approximately equal. Under additional regularity conditions provided in Section 2.3.4, is consistently estimated by the pairwise difference estimator
[TABLE]
and is consistently estimated by the Nadaraya-Watson-type estimator
[TABLE]
where is a kernel function and is a bandwidth parameter depending on the sample size. Since codegree functions are not finite-dimensional, the regularity conditions I provide for consistency are different than what is typically assumed. Conditions sufficient for the estimators to be asymptotically normal, consistent estimators for their variances, and more are provided by Auerbach (2021).
2.3.3 Discussion of Lemma 1
The proof of Lemma 1 can be found in Appendix Section A.2. The first part, that , is almost an immediate consequence of Jensen’s inequality. The second part is related to Theorem 13.27 of Lovász (2012), the logic of which demonstrates that implies when is a continuous function. The proof is short.
[TABLE]
Intuitively, if agents and have identical codegree functions then the difference in their link functions must be uncorrelated with every other link function in the population, as indexed by . In particular, the difference is uncorrelated with and , the link functions of agents and . However, this is only the case when and are perfectly correlated.
Lovász’s theorem demonstrates that agent-pairs with identical codegree functions have identical link functions. The estimation strategy proposed in this paper, however, requires a stronger result that agent-pairs with similar but not necessarily identical codegree functions have similar link functions. This is the statement of Lemma 1.
Auerbach (2021) derives rates of convergence for the estimators under a stronger version of Lemma 1. I include the result here as it may be of independent interest.
Lemma A1: Suppose satisfies and the -Hölder-continuity condition that there exists such that for every . Then for every
[TABLE]
Lemma A1 bounds the cost of using codegree distance as a substitute for network distance in the estimation of and . Its proof can be found in Appendix Section A.2. When , the result requires an agent-pair to have a codegree distance less than to guarantee that their network distance is less than . The rate of convergence of the estimators based on codegree distance may be slower than the infeasible estimators based on network distance.
2.3.4 Consistency
The following regularity conditions are imposed. Let .
Assumption 4: , , and as for some . is nonnegative, twice continuously differentiable, and has support .
The restrictions on are standard. The first two restrictions on are also standard. The third restriction on , that , is new. It ensures that the sums used to estimate and diverge with . If was a continuously distributed -dimensional random vector then, under certain conditions, would be on the order of . The number of agents with codegree function similar to that of agent would be on the order of , and could be chosen so that . Such an assumption, which requires knowledge of the dimension of , is standard. Since is an unknown function, can not necessarily be approximated by a polynomial of of known order and so is explicitly assumed instead. One can verify it in practice (in the same sense that one can choose to satisfy the first two conditions) by computing and choosing so that it is large relative to .
Proposition 2 states that Assumptions 1-4 are sufficient for and to be consistent.
Proposition 2: Under Assumptions 1-4, and as .
The proof of Proposition 2 is complicated by the fact that codegree functions are not finite dimensional. There is no adequate notion of a density for the distribution of codegree functions, which plays a key role in the standard theory (see generally Ferraty and Vieu 2006). Furthermore, even when the functions and are relatively smooth, the bias of may still be large relative to its variance. To make reliable inferences about using , I recommend a bias correction. See Auerbach (2021) for details.
3 Conclusion
This paper proposes a new way to incorporate network data into econometric modeling. An unobserved covariate called social influence is determined by an agent’s link function, which describes the collection of probabilities that the agent is linked to other agents in the population. Estimation is based on matching pairs of agents with similar columns of the squared adjacency matrix.
Understanding how to incorporate different kinds of network data into econometric modeling is an important avenue for future research. A contribution of this paper is to demonstrate that in some cases identification and estimation is possible without strong parametric assumptions about how the network is generated or exactly which features of the network determine the outcome of interest.
Appendix A Appendix
A.1 Network sparsity
The network formation model (2) implies that is almost surely dense or empty in the limit. That is, for a fixed and as tends to infinity, the fraction of realized links in the network converges to which is either positive or zero.
Many networks of interest to economists are sparse, however, in the sense that relatively few agent-pairs in the population interact. The framework of this paper can potentially accommodate sparsity by allowing some parameters of the model to vary with the sample size, for instance
[TABLE]
where and now depend on (see Graham, 2019, Section 3.8). The fraction of realized links in the network converges to which can be arbitrarily small as grows large. Allowing the agent link functions to depend on does not alter the results of Section 2 in that, mutatis mutandis, Assumptions 1-4 still imply Propositions 1 and 2.
But while the results of Section 2 may hold under network sparsity, Assumption 3 is potentially violated. This is because the premise of that assumption is that agents with similar link functions have similar social influence. If is shrinking to [math], then the agent link functions are shrinking to the constant [math] function, and so it implies that relatively small deviations in the agent link functions are sufficient to distinguish agents with different social influences. When this assumption is implausible, alternative assumptions about link formation or better quality data on agent interactions may be necessary.
A.2 Lemma 1
Proof of Lemma 1: Assume . The first claim that for every is almost an immediate consequence of Jensen’s inequality
[TABLE]
where the first inequality is due to Jensen and the second is because .
Now assume for every and . To demonstrate the second claim that for every and there exists a such that , I show that for every and there exists a such that . The claim then follows
[TABLE]
Fix and . Define and for any and . Let be an arbitrary element of . Then
[TABLE]
where the second inequality is due to the triangle inequality and the third inequality is because and implies that and .
Define where for every by assumption. Since the choice of , , and was arbitrary, it follows that
[TABLE]
for any and . The claim follows.
Proof of Lemma A1: Assume . The lower bound follows from the first part of Lemma 1. Now assume the existence of an such that for every and . The second claim that for every
[TABLE]
follows from the second part of Lemma 1 by replacing with . Specifically, . As a result,
[TABLE]
for any and , and so
[TABLE]
for any and . The claim follows.
A.3 Proof of Proposition 1
Proof of Proposition 1: Let and . I first demonstrate claim (ii) that . This claim follows
[TABLE]
where because by Assumption 1 and because
[TABLE]
where is an independent copy of , the first equality is due to the definition of , the first inequality is due to Jensen, and the last inequality is due to Assumption 3.
I now demonstrate claim (i) that uniquely minimizes over . This claim follows from expanding the square
[TABLE]
The first summand is uniquely minimized at by Assumption 2 (see below), the second summand does not depend on , and the third summand is equal to [math] for any since by Assumption 1 and by Assumption 3 following the same logic as in the proof of claim (ii)
[TABLE]
where the first inequality is due to Cauchy-Schwarz and the last inequality is due to Assumption 3.
To see that Assumption 2 implies that is uniquely minimized at , write , , and
[TABLE]
where both equalities are due to the fact that . The first summand is positive semidefinite. The second summand is positive definite by Assumption 2. It follows that is positive definite and so is nonnegative for all and zero only when .
A.4 Proof of Proposition 2
The proof of Proposition 2 relies on the following Lemma B1.
Lemma B1: Suppose Assumptions 1 and 4. Then
[TABLE]
where is the constant from Assumption 4.
Proof of Lemma B1: Let , , , , , and . Then for any fixed
[TABLE]
where in the second equality and in the third equality are demonstrated below, the first inequality is due to the triangle inequality and the union bound, the second inequality is due Jensen and the fact that , and the third inequality is due to the triangle inequality.
The third equality, that follows from the fact that by Bernstein’s inequality and the union bound. Specifically, Bernstein’s inequality implies that for any
[TABLE]
and the union bound gives
[TABLE]
which is and so since and by Assumption 4.
The second equality, that , also follows from Bernstein’s inequality and the union bound because
[TABLE]
which is since and by Assumption 4. This completes the proof.
Proof of Proposition 2: I start with the first result that . Let , , , , and write
[TABLE]
I first show that where by Assumption 4. Nearly identical arguments yield , and so the claim follows since Assumption 2 implies that the eigenvalues of are bounded away from [math] (see below).
Let . By the mean value theorem and smoothness condition on the kernel function in Assumption 4, where are the mean values implied by that theorem. By Lemma B1, and so because is absolutely bounded, has finite second moments, and by Assumption 4. It follows that .
Let . is a second order U-statistic with kernel depending on , in the sense of Ahn and Powell (1993). Their Lemma A.3 (i) implies that since by Assumption 4. So .
Let . A nearly identical argument gives . Furthermore,
[TABLE]
where the first equality is because and the last equality is by Assumptions 3-4, Lemma 1, and because .
The result follows because the eigenvalues of are bounded away from [math] (and so is bounded). To see this, let denote the smallest eigenvalue of , , and . Then
[TABLE]
where the first and second inequalities are due to Jensen, the second equality is because , and the last inequality is by Assumption 2 and the fact that is positive semidefinite.
I now demonstrate that . Let and shorthand , and respectively. Write
[TABLE]
I first consider the denominator . Following previous arguments, Lemma B1 and the smoothness conditions on the kernel function in Assumption 4 imply that
[TABLE]
while Hoeffding’s inequality and the union bound give
[TABLE]
It follows from the triangle inequality that
[TABLE]
since the restrictions on the bandwidth in Assumption 4 imply that .
A nearly identical argument applied to the numerators and gives
[TABLE]
and
[TABLE]
since and have finite eighth moments, , and by Assumption 1.
It follows that
[TABLE]
where, following previous arguments, the term by Assumptions 3-4 and Lemma 1, the term because and , and by the restrictions on the bandwidth and kernel function in Assumption 4.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ahn and Powell (1993) Ahn, H. and J. L. Powell (1993). Semiparametric estimation of censored selection models with a nonparametric selection mechanism. Journal of Econometrics 58 (1-2), 3–29.
- 2Arduini et al. (2015) Arduini, T., E. Patacchini, and E. Rainone (2015). Parametric and semiparametric iv estimation of network models with selectivity. Technical report, Einaudi Institute for Economics and Finance (EIEF).
- 3Auerbach (2021) Auerbach, E. (2021). Identification and estimation of a partially linear regression model using network data: Inference and an application to network peer effects. ar Xiv preprint ar Xiv:2105.10002 .
- 4Bramoullé et al. (2009) Bramoullé, Y., H. Djebbari, and B. Fortin (2009). Identification of peer effects through social networks. Journal of econometrics 150 (1), 41–55.
- 5Bramoullé et al. (2019) Bramoullé, Y., H. Djebbari, and B. Fortin (2019). Peer effects in networks: A survey.
- 6de Giorgi et al. (2010) de Giorgi, G., M. Pellizzari, and S. Redaelli (2010). Identification of social interactions through partially overlapping peer groups. American Economic Journal: Applied Economics 2 (2), 241–75.
- 7de Paula (2020) de Paula, Á. (2020). Econometric models of network formation.
- 8Ferraty and Vieu (2006) Ferraty, F. and P. Vieu (2006). Nonparametric functional data analysis: theory and practice . Springer Science & Business Media.
