TL;DR
This paper introduces the dynamic-$\mathbb{S}^{1}$ model, a simple latent space framework that explains key properties of proximity networks and their impact on spreading processes, with analytical insights and real-world relevance.
Contribution
The paper presents the dynamic-$\mathbb{S}^{1}$ model, linking network properties to latent space geometry and network temperature, enabling mathematical analysis of proximity network dynamics.
Findings
Distributions are power laws with exponents matching real data.
Network temperature influences degree distributions and component formation.
Spreading processes behave similarly in real and modeled networks.
Abstract
Proximity networks are time-varying graphs representing the closeness among humans moving in a physical space. Their properties have been extensively studied in the past decade as they critically affect the behavior of spreading phenomena and the performance of routing algorithms. Yet, the mechanisms responsible for their observed characteristics remain elusive. Here, we show that many of the observed properties of proximity networks emerge naturally and simultaneously in a simple latent space network model, called dynamic-. The dynamic- does not model node mobility directly, but captures the connectivity in each snapshot---each snapshot in the model is a realization of the model of traditional complex networks, which is isomorphic to hyperbolic geometric graphs. By forgoing the motion component the model facilitates mathematicalâŠ
| Network | |||||
| Hospital | 75 | 17376 | 2.9 | 0.05 | 30 |
| Primary school | 242 | 5846 | 30 | 0.18 | 69 |
| High school | 327 | 18179 | 17 | 0.06 | 36 |
| Conference | 113 | 10618 | 3.3 | 0.03 | 39 |
| Friends & Family | 131 | 57961 | 52 | 1.1 | 97 |
| Modeled network | ||||||
| Hospital | 75 | 17376 | 2.5 | 0.04 | 30 | 0.84 |
| Primary school | 242 | 5846 | 33 | 0.17 | 69 | 0.72 |
| High school | 327 | 18179 | 18 | 0.06 | 35 | 0.61 |
| Conference | 113 | 10618 | 2.9 | 0.03 | 30 | 0.85 |
| Friends & Family | 131 | 57961 | 67 | 1.1 | 96 | 0.53 |
| Network | Contact dist. | Inter-contact dist. | |
|---|---|---|---|
| geometric | geometric | log-normal | |
| HP (model) | |||
| HP (real) | |||
| PS (model) | |||
| PS (real) | |||
| HS (model) | |||
| HS (real) | |||
| CF (model) | |||
| CF (real) | |||
| F & F (model) | |||
| F & F (real) | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Latent geometry and dynamics of proximity networks
Fragkiskos Papadopoulos
Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 33 Saripolou Street, 3036 Limassol, Cyprus
ââ
Marco Antonio RodrĂguez Flores
Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, 33 Saripolou Street, 3036 Limassol, Cyprus
Abstract
Proximity networks are time-varying graphs representing the closeness among humans moving in a physical space. Their properties have been extensively studied in the past decade as they critically affect the behavior of spreading phenomena and the performance of routing algorithms. Yet, the mechanisms responsible for their observed characteristics remain elusive. Here, we show that many of the observed properties of proximity networks emerge naturally and simultaneously in a simple latent space network model, called dynamic-. The dynamic- does not model node mobility directly, but captures the connectivity in each snapshotâeach snapshot in the model is a realization of the model of traditional complex networks, which is isomorphic to hyperbolic geometric graphs. By forgoing the motion component the model facilitates mathematical analysis, allowing us to prove the contact, inter-contact and weight distributions. We show that these distributions are power laws in the thermodynamic limit with exponents lying within the ranges observed in real systems. Interestingly, we find that network temperature plays a central role in network dynamics, dictating the exponents of these distributions, the time-aggregated agent degrees, and the formation of unique and recurrent components. Further, we show that paradigmatic epidemic and rumor spreading processes perform similarly in real and modeled networks. The dynamic- or extensions of it may apply to other types of time-varying networks and constitute the basis of maximum likelihood estimation methods that infer the node coordinates and their evolution in the latent spaces of real systems.
I Introduction
Understanding the time-varying proximity patterns among humans in a physical space is important in various contexts. These include the analysis and containment of spreading phenomena, like respiratory transmitted diseases, the design of routing algorithms for mobile networks, and the understanding of social relationships and influence Barrat and Cattuto (2015); Holme (2016); Holme and Litvak (2017); Hui et al. (2005); Chaintreau et al. (2007); Karagiannis et al. (2010); Dong et al. (2011); Aharony et al. (2011). To this end, proximity networks have been captured in different environments Chaintreau et al. (2007); Vanhems et al. (2013); Stehlé et al. (2011); Mastrandrea et al. (2015); Génois et al. (2015); Isella et al. (2011); Dong et al. (2011); Aharony et al. (2011). Each snapshot in these networks corresponds to an observation interval, which typically spans a few seconds to several minutes depending on the devices used to collect the data. The agents (nodes) in each snapshot are individuals and an edge between two agents means that they are within proximity range.
At the finest granularity level an edge between two agents represents a close-range face-to-face proximity (up to  m, detected using wearable sensors). Such networks have been captured over the period of few days or weeks in different closed settings, such as hospitals, schools, scientific conferences and workplaces Vanhems et al. (2013); Stehlé et al. (2011); Mastrandrea et al. (2015); Isella et al. (2011); Génois et al. (2015). The main motivation for obtaining these data has emerged in epidemiological studies of infectious diseases. Other proximity networks have been captured for longer periods of time (months) and over larger areas, such as university campuses, using Bluetooth sensing or WiFi tracking Chaintreau et al. (2007); Dong et al. (2011); Aharony et al. (2011). These methods yield information only on proximity at a range, e.g., up to  m using Bluetooth devices and up to  m or more using WiFi tracking Dong et al. (2011); Aharony et al. (2011); Henderson et al. (2008). Thus, proximity in these networks does not imply face-to-face interaction. The collection of these data has been motivated by research in mobile networking Hui et al. (2005); Chaintreau et al. (2007); Karagiannis et al. (2010) and social studies Dong et al. (2011); Aharony et al. (2011).
Irrespectively of the context, measurement period, and measurement method, different proximity networks have been shown to exhibit similar statistical properties Barrat and Cattuto (2015); Starnini et al. (2017); Chaintreau et al. (2007); Karagiannis et al. (2010). The most widely studied properties are the aggregatedâobtained by considering the samples from all pairs of nodes togetherâdistributions of contact and inter-contact durations. The former is the distribution of time that a pair of nodes spends in contact, i.e., remains within proximity range, while the latter is the distribution of time separating two contacts between the same pair of nodes. These metrics are important in determining the capacity and delay of a network, and the dynamics of spreading processes Conti and Giordano (2014); Vazquez et al. (2007); Smieszek (2009); Machens et al. (2013); Gauvin et al. (2013). It has been found that both of these distributions are broad in real data and compatible with power laws, , with or without exponential cutoffs Hui et al. (2005); Chaintreau et al. (2007); Karagiannis et al. (2010); Starnini et al. (2017). Studies have reported exponents for contact durations SPc ; Scherrer et al. (2008) and for inter-contact durations Hui et al. (2005); Chaintreau et al. (2007); Takaguchi et al. (2011); Fournet and Barrat (2014). Further, it has been shown that aggregated power laws can emerge from pairwise distributions that are either power-laws, exponentials or log-normals, with the latter two better fitting most pairwise inter-contact durations in real data Conan et al. (2007); Passarella and Conti (2013); Gao et al. (2009). Another property of interest is the distribution of the total duration of contacts between two agents throughout the observation period, called weight distribution Starnini et al. (2017); Gauvin et al. (2013); Vestergaard et al. (2014). The aggregated weight distribution is also roughly compatible with power laws Starnini et al. (2017), while an exponent has been reported for this distribution in the contact network of high school students Fournet and Barrat (2014).
These and other distinctive features of real proximity networks can be well reproduced by minimal models of mobile interacting agents Starnini et al. (2013, 2017); Flores and Papadopoulos (2018). Minimal models, i.e., models that reproduce many of the observed properties under minimal assumptions, are crucial for generating realistic synthetic networks and understanding the mechanisms that are responsible for the observed behaviors. In particular, the recently developed Force-Directed Motion (FDM) model Flores and Papadopoulos (2018) utilizes the idea of a latent metric space where the agents reside, and where the distance between two agents abstracts their similarity. Attractive forces that decrease exponentially with the similarity distance direct the agentsâ motion towards other agents in the physical space, and determine the duration of their interactions. One can also consider the effective distance between two agents, , where and are the agentsâ expected degrees per snapshot, abstracting their popularity Papadopoulos et al. (2012). In this case, dissimilar agents can still be attracted by strong forces if their popularities are high. The FDM casts the problem of modeling proximity networks as an -body problem akin to molecular dynamics Schlick (2010). However, mathematically proving the properties of generated networks by the FDM is not straightforward, and the model has been so far studied only in simulations.
The FDM has been inspired by the model of traditional (non-mobile) complex networks Krioukov et al. (2010); Serrano et al. (2008). In the , nodes are also separated by effective distances , and are connected with the Fermi-Dirac connection probability , where is the network temperature, controlling clustering Dorogovtsev (2010) in the network. The is isomorphic to hyperbolic geometric graphs Krioukov et al. (2010). It can generate network snapshots that possess many of the common structural properties of real networks, including heterogeneous or homogeneous degree distributions, strong clustering, and the small-world property Serrano et al. (2008); Krioukov et al. (2010); Papadopoulos et al. (2012). Fig. 1 shows the probability that two agents are connected in a snapshot of FDM-simulated networks as a function of their effective distance. Interestingly, we see that this probability resembles qualitatively the Fermi-Dirac connection probability in the model, even though this form of connection probability is not enforced into the FDM. Specifically, we see in Fig. 1 that the connection probability in the FDM has a smooth step-like form, where connection probabilities at small distances are orders of magnitude larger than connection probabilities at large distances.
Motivated by the observation in Fig. 1, here we consider a simple latent space model for human proximity networks, where each snapshot is a realization of the model. We call this model dynamic- and show that it simultaneously reproduces many of the observed properties of real systems. The dynamic- does not model node mobility directly, but captures the connectivity in each snapshot. By forgoing the motion component it facilitates mathematical analysis, allowing us to prove the contact, inter-contact and weight distributions. We show that these distributions are power laws in the thermodynamic limit, with exponents , and , respectively, where is the temperature in the Fermi-Dirac connection probability. These exponents are within the ranges observed in real systems. We also show that temperature controls the agentsâ time-aggregated degrees and the formation of unique and recurrent components Flores and Papadopoulos (2018). Additionally, we consider paradigmatic epidemic and rumor spreading processes Keeling and Rohani (2008); Daley and Kendall (1965) and find that they perform remarkably similar in real and modeled networks.
The rest of the paper is organized as follows. In Sec. II we review the model. In Sec. III we introduce the dynamic-. In Sec. IV we juxtapose the properties of modeled and real networks. In Sec. V we compare the performance of epidemic and rumor spreading processes running on them. In Sec. VI we mathematically analyze the main properties of the model. In Sec. VII we elucidate the crucial role of temperature in the formation of components. Finally, in Sec. VIII we conclude the paper with future work directions.
II model
In the model Krioukov et al. (2010) each node has latent (or hidden) variables . The latent variable is proportional to the nodeâs expected degree in the resulting network. The latent variable is the angular similarity coordinate of the node on a circle of radius , where is the total number of nodes. To construct a network with the model that has size , average node degree , and temperature , we perform the following steps:
- (1)
coordinate assignment: for each node , sample its angular coordinate uniformly at random from , and its degree variable from a probability density function (PDF) ; 2. (2)
creation of edges: connect every pair of nodes with the Fermi-Dirac connection probability
[TABLE]
In the last expression, is the effective distance between nodes and ,
[TABLE]
where . Parameter in (2) is derived from the condition that the expected degree in the network is indeed , yielding
[TABLE]
where . The expected degree of a node with latent variable is Krioukov et al. (2010)
[TABLE]
For sparse networks () the resulting degree distribution has a similar functional form as  Boguñå and Pastor-Satorras (2003). For instance, a power law degree distribution with exponent is obtained if , while a Poisson degree distribution with mean is obtained if , where is the Dirac delta function Boguñå and Pastor-Satorras (2003); Serrano et al. (2008). Smaller values of the temperature favor connections at smaller effective distances and increase the average clustering Dorogovtsev (2010) in the network, which is maximized at , and nearly linearly decreases to zero with . At the connection probability in (1) becomes the step function if , and if .
III Dynamic-
The dynamic- models a sequence of network snapshots, , , where is the total number of time slots. Each snapshot is a realization of the model. Therefore, there are agents that are assigned latent variables as in the model, which remain fixed in all time slots. The temperature is also fixed, while each snapshot is allowed to have a different average degree . Thus, the model parameters are , , and . The snapshots are generated according to the following simple rules:
- (1)
at each time step , snapshot starts with disconnected nodes, while in Eq. (3) is set equal to ; 2. (2)
each pair of nodes connects with probability given by Eq. (1); 3. (3)
at time , all the edges in snapshot are deleted and the process starts over again to generate snapshot .
We note that the snapshots are conditionally independent given the agentsâ latent variables , but not independent. In other words, even though each snapshot is constructed anew, there are correlations among the snapshots that are induced by the nodesâ effective distances . In particular, nodes at smaller effective distances have higher chances of being connected in each snapshot, as dictated by the connection probability in (1). Fig. 2 provides a visualization of snapshots generated by the model, where we see that agents at smaller similarity distances tend to stay connected in consecutive time slots and form recurrent components. We make the code implementing the model available at mod . Next, we compare the properties of synthetic networks generated by the model and real networks.
IV Modeled vs. real networks
IV.1 Overview of real networks
We consider four face-to-face interaction networks from SocioPatterns Soc , which correspond to: (i) a hospital ward in Lyon Vanhems et al. (2013); (ii) a primary school in Lyon Stehlé et al. (2011); (iii) a high school in Marseilles Mastrandrea et al. (2015); and (iv) a scientific conference in Turin Isella et al. (2011). These networks were captured over a period of , , and days, respectively. Each of their snapshots corresponds to a time slot of sec. We also consider the Bluetooth-based proximity network of the members of a residential community adjacent to a research university in North America, taken from the Friends and Family dataset Aharony et al. (2011). The snapshots here correspond to slots of min, spanning the period October 2010 to May 2011. In all cases we number the slots and assign node IDs sequentially, and . Table 1 gives an overview of the data.
We define the average degree per slot of agent as
[TABLE]
where is agentâs degree in slot , while the average agent (snapshot) degree in slot is
[TABLE]
Fig. 3 shows the distribution of and in the considered networks. The average agent degree per slot is
[TABLE]
IV.2 Modeled networks
For each real network we construct its synthetic counterpart using the dynamic-. Each counterpart has the same number of nodes and duration as the corresponding real network, while the latent variable of each agent is set equal to the agentâs average degree per slot in the real network,
[TABLE]
Thus, the distribution of is the corresponding empirical distribution in Fig. 3 (left). The target average degree in each snapshot , , is set equal to the average degree in the corresponding real snapshot at slot . Finally, the temperature is set such that the resulting average time-aggregated degree, , is similar to the one in the real networkâwe analyze the dependence of on in Sec. VI.4.
In the counterparts, the expected degree of agent in slot is [Eq. (4)]
[TABLE]
while agentâs expected degree per slot is . The counterparts aim at capturing the variability in the number of interacting agents per slot since the probability that an agent interacts with at least one other agent in slot is
[TABLE]
while .
IV.3 Properties of modeled vs. real networks
Table 2 gives an overview of the counterparts. We see that their characteristics are overall very similar to the ones of the real networks (Table 1). Further, Fig. 4 shows that the counterparts indeed capture the variability in the number of interacting agents per slot.
In Figs. 5 and 6 we compare a range of other properties between real and modeled networks, considered also in Starnini et al. (2013, 2016); Flores and Papadopoulos (2018). These properties are:
- (a)
The aggregated contact distribution, i.e., the distribution of the number of slots that a pair of nodes remains connected.
- (b)
The aggregated inter-contact distribution, i.e., the distribution of the number of slots that a pair of nodes remains disconnected.
- (c)
The aggregated weight distribution, which is the distribution of the edge weights in the time-aggregated network. In this network two nodes are connected if they were connected in at least one slot, while the weight of an edge is the total number of slots that the two endpoints of the edge were connected.
- (d)
The strength distribution, which is the distribution of the node strengths in the time-aggregated network. The strength of a node is the sum of the weights of all edges attached to the node.
- (e)
The distribution of component sizes, which is the distribution of the number of nodes in the connected components formed throughout the observation period .
- (f)
The distribution of the shortest time-respecting path lengths across all pairs of nodes. As an example, consider three nodes , and , where and connect at slot and and connect at slot . The time-respecting path between and is and has length . The shortest time-respecting path between and is the shortest such path throughout the observation period.
- (g)
The average total duration of a group as a function of its size. A group is a set of nodes forming a connected component. The total duration of a group is the total number of slots where the exact same set of nodes formed a connected component. For each group size we compute the average of this duration among groups with that specific size.
- (h)
Finally, we consider the average number of recurrent components where an agent participates as a function of its total number of interactions (strength) throughout the observation period. A connected component formed in a slot is called recurrent if a connected component with exactly the same nodes was formed in a previous slot  Flores and Papadopoulos (2018). We consider recurrent components consisting of at least three nodes.
Figs. 5 and 6 show that the dynamic- reproduces all the above properties remarkably well. A main exception are the longer paths in the conference [Fig. 5(f)], which can not be captured by the model. We also note that in conferenceâs counterpart could not exceed (vs. in the real network). Thus, the dynamic- does not totally capture the characteristics of this network. Interestingly, this was also the case with the FDM Flores and Papadopoulos (2018). Finally, we note that the ability of the model to capture the properties of the considered networks is not due to mere calibration of expected node degrees. In Appendix A, we show that the configuration model Chung and Lu (2002); Park and Newman (2004) with the same calibration of expected node degrees, Eqs. (8, 9), cannot reproduce the abundance of recurrent components, nor the broad contact, inter-contact and weight distributions observed in the real systems. Further, in Sec. VI we prove these distributions in the dynamic- and show that they do not depend on the distribution of the degree variables . Below, we also investigate the pairwise contact and inter-contact distributions in modeled and real networks.
IV.4 Pairwise contact and inter-contact distributions
If the expected snapshot degrees, , are independent and identically distributed, the pairwise contact and inter-contact distributions in the dynamic- are geometric at  111For finite they are truncated geometric.. Indeed, in this case the probability for two nodes with latent variables and angular distance to remain connected for slots, is
[TABLE]
where is the connection probability in Eq. (1), while is the effective distance between the two nodes, which depends on the average snapshot degree [Eqs. (2, 3)], whose PDF is denoted by . Similarly, the probability that the two nodes remain disconnected for slots, is
[TABLE]
In general, these distributions are not geometric in the model as they depend on the stochastic process that describes the time evolution of the expected snapshot degrees.
Previous studies have reported that a significant portion of pairwise inter-contact durations in real data can be fitted with exponential distributions Conan et al. (2007); Gao et al. (2009). Since the geometric distribution is the discrete analogue of the exponential distribution, these studies are in line with Eq. (12). Given these results, we check below how well the geometric distribution captures the pairwise contact and inter-contact distributions in the considered real systems and their modeled counterparts.
For each pair of nodes we consider the sets of its contact and inter-contact durations in each of the activity cycles shown in Fig. 4. We consider sets with at least three distinct duration values. For each set we estimate the parameter of the geometric distribution, i.e., the success probability , where is the mean of the durations in the set. Then, we draw the same number of samples as the number of durations in the set from a geometric distribution with parameter . Subsequently, we use the two-sample Kolmogorov-Smirnov (KS) goodness of fit test Massey (1951); Arnold and Emerson (2011) to test the hypothesis that the values in the set and the sampled values have the same distribution. We recall that such a statistical test can only reject or fail to reject a given hypothesis for a given significance level . This level corresponds to the probability of incorrectly rejecting the hypothesis, while if the test fails to reject the hypothesis, we only know that this is true to a confidence level . We use , and find for each activity cycle the percentage of pairs for which the test failed to reject the hypothesis. Table 3 shows the average of this percentage across the activity cycles in each network, averaged across ten repetitions of the above procedure. The results for each counterpart are also averaged across ten different temporal network realizations.
We see in Table 3 that the geometric distribution fits a high percentage of contact durations in both modeled and real networks. It also fits a high percentage of inter-contact durations in modeled networks, and a significant percentage of inter-contact durations in the real systems, which however is not as high as in the modeled networks. These results suggest that the model captures the variability of the contact durations in the real systems. However, it does not totally capture the variability of the inter-contact durations.
To verify the last statement we also consider a log-normal distribution for the inter-contact durations, which offers a more versatile model to capture the variability in the distributions Conan et al. (2007). We recall that the PDF of the log-normal is , while its skewness is . For each pair of nodes, the parameters and are the mean and variance of the logarithms of its inter-contact durations. We see in Table 3 that the log-normal better fits the inter-contact durations, especially in the real systems, as also observed in Conan et al. (2007). Further, Fig. 7 shows that the inter-contact distributions in the real networks are indeed more skewed on average than in their counterparts. Nevertheless, the aggregated inter-contact distributions are very similar in real and synthetic systems [Figs. 5(b), 6(b)]. In the next section we also see that paradigmatic dynamical processes perform similarly in the two.
V Dynamical processes on modeled vs. real networks
We consider the susceptible-infected-susceptible (SIS) epidemic spreading model Keeling and Rohani (2008) and the DK (Daley and Kendall) model for rumor spreading Daley and Kendall (1965). In the SIS each agent can be in one of two states, susceptible (S) or infected (I). At any time slot an infected agent recovers with probability and becomes susceptible again, whereas infected agents infect the susceptible agents with whom they interact with probability . Thus, the transition of states is S  I  S. In the DK model each agent can be in one of three states, ignorant (I), spreader (S) or stifler (R). An ignorant agent that interacts with a spreader receives the rumor with probability and becomes a spreader, while a spreader that interacts with another spreader or a stifler becomes a stifler with probability and no longer communicates the rumor. The transition of states is I  S  R.
To simulate the SIS process on temporal networks we use the dynamic SIS implementation of the Network Diffusion Library Rossetti et al. (2018). We have also modified this library to implement the DK model. For the SIS process we consider the average percentage of infected agents per slot (prevalence), while for the DK process we consider the percentage of stiflers at the final slot (size of the rumor). Fig. 8 shows that the two processes perform remarkably similar in real and modeled networks. The only exception is in the performance of the SIS in the conference and its counterpart at low infection probabilities [Fig. 8(d)]âa similar behavior has been observed in the FDM Flores and Papadopoulos (2018) and it may be due to the fact that the models do not totally capture the characteristics of this network, as noted in Sec. IV.3.
VI Mathematical analysis
Here we perform a detailed mathematical analysis of the main properties of the dynamic-. To facilitate the analysis, we assume that the expected snapshot degree is the same in all time slots, , . This assumption renders the connection probability between two nodes [Eq. (1)] the same in all slots. However, we illustrate that the analytical results match closely the simulation results from the modeled counterparts of real systems, where this assumption does not hold.
We show that for sparse snapshots, , and large durations , the aggregated contact, inter-contact and weight distributions can be approximated by power laws with exponents , and , respectively, where is the temperature in the connection probability. Technically, we consider these distributions in the thermodynamic limit, , and show that they are power-laws with the aforementioned exponents at . Interestingly, these results do not depend on the distribution of the latent degree variables . Further, we analyze the expected degree in the time-aggregated network, and show that in finite networks the expected strength of a node grows super-linearly with its time-aggregated degree, as empirically observed in prior studies Starnini et al. (2013, 2017). We begin with the contact distribution.
VI.1 Aggregated contact distribution
The probability to observe a sequence of exactly consecutive slots where two nodes with latent variables and angular distance are connected, is the percentage of time where we observe a slot where these two nodes are not connected, followed by slots where they are connected, followed by a slot where they are not connected 222For brevity we ignore the cases where the first/last of the slots that two nodes can be connected starts/ends at the beginning/end of the observation period.. For each duration , there are possibilities where this duration can be realized. For instance, if the two nodes can be disconnected in slot , connected in slots , and disconnected in slot , where . Therefore, the percentage of observation time where a duration of slots can be realized is . Since the two nodes are connected in each slot with probability with in Eq. (2), we have
[TABLE]
Removing the condition on , which is uniform on , yields
[TABLE]
where is the Gauss hypergeometric function Olver et al. (2010). At , the integral in (14) simplifies for and , to
[TABLE]
where is the complete gamma function, ,  333If is a positive integer then .. From (14, 15), we have
[TABLE]
Removing the condition on and , gives
[TABLE]
The aggregated contact distribution, , is the probability that two nodes are connected for exactly consecutive slots given that ,
[TABLE]
[TABLE]
where
[TABLE]
The approximation in (19) uses the facts and , which hold for . We see from (19) that for , is approximately a power law with exponent . At , we have a pure power law
[TABLE]
Fig. 9 shows that (20) provides an excellent approximation to simulation results.
From (19), the expected contact duration in the thermodynamic limit is
[TABLE]
At , the last relation simplifies to
[TABLE]
Next, we derive the aggregated inter-contact distribution following the same steps.
VI.2 Aggregated inter-contact distribution
Let be the probability to observe a slot where two nodes with latent variables and angular distance are connected, followed by slots where they are not connected, followed by a slot where they are again connected. We have
[TABLE]
Removing the condition on , yields
[TABLE]
At , the integral in (24) simplifies for , to
[TABLE]
From (24, 25), and after removing the condition on and , we have
[TABLE]
The aggregated inter-contact distribution, , is the probability that two nodes are disconnected for exactly consecutive slots given that ,
[TABLE]
[TABLE]
where
[TABLE]
The approximation in (28) holds for . For , is approximately a power law with exponent . At , we have a pure power law
[TABLE]
Fig. 10 juxtaposes (29) against simulation results.
From (28), the expected inter-contact duration in the thermodynamic limit is
[TABLE]
The above relation increases approximately exponentially with , and diverges at ,
[TABLE]
We proceed with the weight distribution.
VI.3 Aggregated weight distribution
The probability that two nodes with latent variables and angular distance are connected in slots, is given by the binomial distribution
[TABLE]
Removing the condition on , yields
[TABLE]
where
[TABLE]
To reach (VI.3), we perform the change of integration variable and express the binomial coefficient in terms of gamma functions, .
At , , and the second term inside the brackets in (VI.3) vanishes for and . Removing the condition on and , we have
[TABLE]
For , we can write
[TABLE]
The aggregated weight distribution, , is the probability that two nodes are connected in slots given that ,
[TABLE]
[TABLE]
where
[TABLE]
The approximation in (39) holds for . We see from (39) that for , is approximately a power law with exponent . At , we have a pure power law
[TABLE]
From (38), the expected weight in the thermodynamic limit is
[TABLE]
The above relation decreases approximately exponentially with , and diverges at ,
[TABLE]
We next turn our attention to the expected degree in the time-aggregated network.
VI.4 Time-aggregated degree and finite size effects
The probability that two agents with latent variables do not interact, is obtained by setting in (VI.3),
[TABLE]
where in (34). Removing the condition on and gives the probability that two agents do not interact
[TABLE]
The expected time-aggregated degree is
[TABLE]
At , is given by (36). Substituting in (36) with its expression in (3), gives
[TABLE]
which increases exponentially with and linearly with . Fig. 11 juxtaposes simulation results against (44, 45) and the limit in (46). We see an excellent agreement between (44, 45) and simulations, while (46) is a good approximation only at sufficiently low temperatures.
Similarly, the expected time-aggregated degree of a node with latent variable , is
[TABLE]
Fig. 12 juxtaposes simulation results against (47) and (48). We again see an excellent agreement between the exact prediction (47) and simulations, while (48) is a good approximation only for sufficiently small . Therefore, one in general needs to use exact expressions [(44, 45), (47)] to accurately compute expected time-aggregated degrees. The thermodynamic limit approximations [(46), (48)] are accurate only at sufficiently low temperatures.
We also note that the normalization factor of the weight distribution in (38) can be rewritten as
[TABLE]
where in (46). Fig. 13 juxtaposes (38) against simulation results, where in view of Fig. 11, we use in (49) the actual value of in the simulations instead of its limit in (46). We see again a very good agreement between theory and simulations.
VI.5 Strength-degree correlations
We now analyze the strength-degree correlations in the time-aggregated network and justify previous empirical observations reporting a super-linear dependence between an individualâs expected strength and its time-aggregated degree Starnini et al. (2013, 2017).
The expected weight between two nodes with latent variables , is
[TABLE]
where in (VI.3). At , the second term inside the brackets in (VI.3) vanishes for and , yielding
[TABLE]
The expected strength of a node with latent variable , is
[TABLE]
Fig. 14 juxtaposes (52) against simulation results. We see that (52) can be a good approximation in finite networks. This is because the second term inside the brackets in (VI.3) vanishes even for finite networks as increases. The smaller the temperature the faster this term vanishes and the better the approximation in (52) is for finite networks.
We also see from (48, 52) that in the thermodynamic limit the expected strength of a node grows linearly with its expected time-aggregated degree,
[TABLE]
However, in the counterparts grows sub-linearly with (Fig. 12), while grows approximately linearly (Fig. 14). Thus, in the considered systems we expect the strength of a node to grow super-linearly with its time-aggregated degree, as verified in Fig. 15 and empirically observed in prior studies Starnini et al. (2013, 2017).
VII Component dynamics and temperature
Finally, we elucidate the important role of the temperature in the formation of components. To this end, we consider the connected components formed in all time slots throughout the observation period , which consist of at least three nodes. We consider both unique and recurrent components. A component in a slot is called unique if it is seen for the first time, i.e., it is a component that does not consist of exactly the same nodes as a component seen in a previous slot. Otherwise, the component is recurrent. Fig. 16 shows that as increases, the number of unique components increases almost exponentially up to a point and then decreases. This is because larger values of increase the connection probability [Eq. (1)] at larger distances (), while decreasing it at smaller distances (). Since there are more pairs of nodes separated by larger distances, the number of unique components formed increases. However, at larger closer to one, the probability of connections is relatively small at smaller and larger distances, which causes this number to decrease. The inset in Fig. 16 shows the size of the largest component formed.
Further, Fig. 16 shows that the ratio of the total number of components formed to the number of unique components formed decreases with . This means that as increases fewer recurrent components are formed per unique component. This is expected since at larger unique components consist of pairs separated by larger distances, and the probability to form again the same such components is vanishing.
VIII Conclusion
Despite its simplicity the dynamic- reproduces adequately many of the observed properties of real proximity networks. At the same time the model is amenable to mathematical analysis. We have proved here the modelâs main properties (Sec. VI). Other properties were studied only via simulations (Sec. IV.3) and it would be interesting in future work to prove those properties as well. We have seen that network temperature plays a central role in network dynamics, dictating the contact, inter-contact and weight distributions, the time-aggregated degrees, and the formation of unique and recurrent components.
The dynamic- may not capture the properties of a real network exactly. For instance, the aggregated contact, inter-contact and weight distributions may deviate from pure power laws, may follow power laws with exponential cutoffs, may have different exponents than exactly , etc., cf. Fig. 6(a). Further, we have seen that the pairwise inter-contact distributions are on average more skewed in real networks than in the model. As future work, it would be also interesting to investigate what mechanisms need to be introduced into the model in order to be able to capture such variations.
We also note that memory in the dynamic- is induced only via the nodesâ latent variables (). Extensions to the model with link persistence, where connections/disconnections can also be copied from the previous to the next snapshot Mazzarisi et al. (2020); Papadopoulos and Kleineberg (2019), would allow additional control over the rate of dynamics, i.e., on how fast the topology changes from snapshot to snapshot. Further, generalizations of the model that would allow the nodesâ latent variables () to change over time are desirable. However, for this purpose, one would first need to find the equations that realistically describe the motion of nodes in their latent spaces. The dynamic- or extensions of it may apply to other types of time-varying networks, such as the ones considered in Perra et al. (2012); Karsai et al. (2014), and constitute the basis of maximum likelihood estimation methods that infer the node coordinates and their evolution in the latent spaces of real systems Kim et al. (2018). Taken altogether, our results pave the way towards generative modeling of temporal networks that simultaneously satisfies simplicity, realism, and mathematical tractability.
Acknowledgements.
The authors acknowledge support by the EU H2020 NOTRE project (grant 692058).
Appendix A dynamic- vs. configuration model
The dynamic- utilizes the model at the cold regime where the temperature is (Sec. II). The can be also defined at the hot regime,  Krioukov et al. (2010).
Like traditional complex networks Krioukov et al. (2010), proximity networks appear to belong to the cold regime. Indeed, as seen in Table 2, all counterparts have . Further, Fig. 16 shows that the number of recurrent components quickly decreases with , becoming small at , while real networks have large numbers of recurrent components (cf. Figs. 5(h), 6(h) and Flores and Papadopoulos (2018)).
Analyzing the dynamic- at the hot regime is beyond the scope of this paper. However, we consider here a limiting case at this regime, where the model degenerates to the configuration model, i.e., to the ensemble of graphs with given expected degrees Chung and Lu (2002); Park and Newman (2004). This case corresponds to letting , while completely ignoring the angular distances among the nodes, see Krioukov et al. (2010) for details. The connection probability between two nodes becomes
[TABLE]
For sparse networks () and distributions of that are not too broad (conditions that hold in the considered networks, Fig. 3), we can write . Using this approximation, it is easy to see that the expected degree of a node with latent variable is given by (4), while the average degree in the resulting network is .
We now build synthetic counterparts for the real networks of Sec. IV.1 using the dynamic- as described in Secs. III and IV.2, except that we connect the nodes in each snapshot with the connection probability in (54) [instead of (1)]. Since there is no temperature in (54), we can no longer control the average time-aggregated degree, which is significantly larger in the counterparts, , for the hospital, primary school, high school, conference and Friends & Family, respectively (vs. the ones in Table 1). As expected, we see in Fig. 17 that the configuration model cannot reproduce the abundance of recurrent components observed in the real networks. Further, it cannot capture their broad contact, inter-contact and weight distributions (Fig. 17).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Barrat and Cattuto (2015) A. Barrat and C. Cattuto, âFace-to-face interactions,â in Social Phenomena: From Data Analysis to Models (Springer, Cham, 2015) pp. 37â57. · doi â
- 2Holme (2016) P. Holme, âTemporal network structures controlling disease spreading,â Phys. Rev. E 94 , 022305 (2016) . · doi â
- 3Holme and Litvak (2017) P. Holme and N. Litvak, âCost-efficient vaccination protocols for network epidemiology,â PLOS Computational Biology 13 , 1â18 (2017) . · doi â
- 4Hui et al. (2005) P. Hui, A. Chaintreau, J. Scott, R. Gass, J. Crowcroft, and C. Diot, âPocket switched networks and human mobility in conference environments,â in Proceedings of the ACM SIGCOMM Workshop on Delay-tolerant Networking , WDTN â05 (ACM, New York, USA, 2005) pp. 244â251. · doi â
- 5Chaintreau et al. (2007) A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, âImpact of human mobility on opportunistic forwarding algorithms,â IEEE Transactions on Mobile Computing 6 , 606â620 (2007) . · doi â
- 6Karagiannis et al. (2010) T. Karagiannis, J.-Y. Le Boudec, and M. Vojnovic, âPower law and exponential decay of intercontact times between mobile devices,â IEEE Transactions on Mobile Computing 9 , 1377â1390 (2010) . · doi â
- 7Dong et al. (2011) W. Dong, B. Lepri, and AS. Pentland, âModeling the co-evolution of behaviors and social relationships using mobile phone data,â in Proceedings of the International Conference on Mobile and Ubiquitous Multimedia , MUM â11 (ACM, New York, USA, 2011) pp. 134â143. · doi â
- 8Aharony et al. (2011) N. Aharony, W. Pan, C. Ip, I. Khayal, and AS. Pentland, âSocial f MRI: Investigating and shaping social mechanisms in the real world,â The Ninth Annual IEEE International Conference on Pervasive Computing and Communications (Per Com 2011) , Pervasive and Mobile Computing 7 , 643â659 (2011) . · doi â
