Temporal similarity metrics for latent network reconstruction: The role of time-lag decay
Hao Liao, Ming-Kai Liu, Manuel Sebastian Mariani, Mingyang Zhou,, Xingtong Wu

TL;DR
This paper introduces new temporal similarity metrics for reconstructing hidden propagation networks from diffusion data, highlighting the importance of time-lag decay and its impact on accuracy depending on network clustering.
Contribution
The paper proposes novel temporal similarity metrics that incorporate time-lag decay, improving network reconstruction from diffusion processes compared to static metrics.
Findings
Time-lag decay significantly influences reconstruction accuracy.
Temporal metrics outperform static ones in certain network structures.
Network clustering affects the optimal choice of time-lag function.
Abstract
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similarity metrics commonly used in the link prediction literature, we introduce new node-node temporal similarity metrics. The new metrics take as input the time-series of multiple independent spreading processes, based on the hypothesis that two nodes are more likely to be connected if they were often infected at similar points in time. This hypothesis is implemented by introducing a time-lag function which penalizes distant infection times. We find that the choice of this time-lag strongly affects…
| Network | url | |||||
|---|---|---|---|---|---|---|
| Zachary karate club (Zkc) | 34 | 78 | 4.58 | 0.12 | 0.1688 | url |
| Highschool (Highs) | 70 | 366 | 10.45 | 0.29 | 0.1487 | url |
| Polbooks (Polbs) | 105 | 441 | 8.4 | 0.15 | 0.2067 | url |
| Word | 112 | 425 | 3.79 | 0.17 | 0.0783 | url |
| Hypertext (Hypert) | 113 | 20818 | 368.46 | 0.26 | 0.0392 | url |
| Football (Footb) | 115 | 1231 | 21.4 | 0.17 | 0.1623 | url |
| Little Rock Lake (LRL) | 183 | 2494 | 27.25 | 0.09 | 0.0229 | url |
| Jazz | 198 | 2742 | 27.69 | 0.62 | 0.0266 | url |
| Residence hall (Rhall) | 217 | 2672 | 24.62 | 0.24 | 0.0688 | url |
| E.coli | 230 | 695 | 6.04 | 0.22 | 0.0752 | url |
| Physicians (Phys) | 241 | 1098 | 9.11 | 0.13 | 0.1366 | url |
| Neural | 297 | 2359 | 15.88 | 0.12 | 0.049 | url |
| USAir | 332 | 2126 | 12.8 | 0.63 | 0.0231 | url |
| Slavko | 334 | 2218 | 13.28 | 0.17 | 0.0791 | url |
| Netsci | 379 | 914 | 4.82 | 0.74 | 0.1424 | url |
| Dublin | 410 | 2765 | 13.48 | 0.30 | 0.1044 | url |
| Caenorhabditis elegans (Cae) | 453 | 4596 | 10.15 | 0.07 | 0.0465 | url |
| Unicode (Unic) | 767 | 1255 | 3.27 | 0.01 | 0.0455 | url |
| Scsc | 961 | 1925 | 4.01 | 0.02 | 0.2033 | url |
| 1133 | 5451 | 9.62 | 0.22 | 0.0565 | url | |
| Euroroad (Eroad) | 1174 | 1417 | 2.41 | 0.01 | 0.1563 | url |
| Blogs | 1224 | 19025 | 31.08 | 0.14 | 0.0123 | url |
| Air traffic control (Air.tra) | 1226 | 2615 | 4.26 | 0.02 | 0.2353 | url |
| TAP | 1373 | 6833 | 9.96 | 0.53 | 0.0651 | url |
| Crim | 1380 | 1476 | 2.13 | 0.1 | 0.0458 | url |
| Chicago (Chic) | 1467 | 1298 | 1.76 | 0 | 0.1411 | url |
| Human protein (HP) | 1706 | 6207 | 7.27 | 0.02 | 0.0653 | url |
| Bible | 1773 | 16401 | 18.5 | 0.1 | 0.1299 | url |
| Hamsterster friendships (HF) | 1858 | 12534 | 13.49 | 0 | 0.0217 | url |
| UC Irvine messages (UC.irv) | 1899 | 59835 | 63.01 | 0.04 | 0.0317 | url |
| DNC emails (DNC) | 2029 | 39264 | 38.70 | 0.06 | 0.0164 | url |
| IUI | 2288 | 4190 | 3.66 | 0.03 | 0.3068 | url |
| PPI | 2375 | 11693 | 9.84 | 0.3 | 0.0301 | url |
| Adolescent health (Health) | 2539 | 12969 | 10.21 | 0.10 | 0.1408 | url |
| Amazon (Ama) | 2880 | 5037 | 3.49 | 0.01 | 0.1651 | url |
| Facebook (Faceb) | 2888 | 2981 | 2.06 | 0 | 0.1879 | url |
| Openflights (Oflgs) | 2939 | 30501 | 20.75 | 0.39 | 0.0184 | url |
| Powergrid (Pgrid) | 4941 | 6594 | 2.66 | 0.01 | 0.1175 | url |
| Subelj | 6434 | 150985 | 46.93 | 0.09 | 0.0513 | url |
| Advogato (Adv) | 6541 | 51127 | 7.82 | 0.11 | 0.0171 | url |
| Network | Basic properties | AUC | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | E | COS | TCOS | TCOS1 | SSI | TSSI | TSSI1 | LHN | TLHN | TLHN1 | HDI | THDI | THDI1 | |
| Zkc | 34 | 78 | 0.6683 | 0.6816 | 0.6811 | 0.6659 | 0.6797 | 0.6835 | 0.6585 | 0.6770 | 0.6776 | 0.6604 | 0.6782 | 0.6775 |
| Highs | 70 | 366 | 0.7614 | 0.8532 | 0.8525 | 0.7600 | 0.8491 | 0.8490 | 0.6396 | 0.8367 | 0.8367 | 0.7500 | 0.8411 | 0.8410 |
| Polbs | 105 | 441 | 0.7201 | 0.7206 | 0.7199 | 0.7073 | 0.7096 | 0.7101 | 0.6552 | 0.6711 | 0.6714 | 0.6888 | 0.7002 | 0.7011 |
| Word | 112 | 425 | 0.8413 | 0.9350 | 0.9506 | 0.7821 | 0.8898 | 0.9305 | 0.8311 | 0.9033 | 0.9396 | 0.8056 | 0.8937 | 0.9313 |
| Hypert | 113 | 20818 | 0.5314 | 0.5447 | 0.5438 | 0.5219 | 0.5356 | 0.5352 | 0.5446 | 0.5491 | 0.5509 | 0.5137 | 0.5271 | 0.5271 |
| Footb | 115 | 1231 | 0.6789 | 0.7216 | 0.7216 | 0.6557 | 0.6945 | 0.6941 | 0.6813 | 0.7116 | 0.7108 | 0.6373 | 0.6750 | 0.6741 |
| LRL | 183 | 2494 | 0.6462 | 0.6488 | 0.6491 | 0.6452 | 0.6482 | 0.6481 | 0.6431 | 0.6469 | 0.6469 | 0.6464 | 0.6488 | 0.6488 |
| Jazz | 198 | 2742 | 0.8312 | 0.8650 | 0.8783 | 0.8211 | 0.8609 | 0.8753 | 0.7563 | 0.7875 | 0.8218 | 0.7942 | 0.8487 | 0.8686 |
| Rhall | 217 | 2672 | 0.7103 | 0.8683 | 0.8682 | 0.6856 | 0.7954 | 0.7956 | 0.7094 | 0.8308 | 0.8311 | 0.7030 | 0.8344 | 0.8346 |
| E. coli | 230 | 695 | 0.9018 | 0.9726 | 0.9822 | 0.8876 | 0.9659 | 0.9797 | 0.7924 | 0.8999 | 0.9307 | 0.8965 | 0.9504 | 0.9712 |
| Phys | 241 | 1098 | 0.8610 | 0.8737 | 0.8738 | 0.8587 | 0.8717 | 0.8720 | 0.6278 | 0.8375 | 0.8377 | 0.8521 | 0.8672 | 0.8675 |
| Neural | 297 | 2359 | 0.7314 | 0.8266 | 0.8380 | 0.7287 | 0.8098 | 0.8303 | 0.7174 | 0.7865 | 0.8068 | 0.7213 | 0.7920 | 0.8176 |
| USAir | 332 | 2126 | 0.9135 | 0.9435 | 0.9539 | 0.9089 | 0.9365 | 0.9474 | 0.7852 | 0.8199 | 0.8273 | 0.9023 | 0.9289 | 0.9390 |
| Slavko | 334 | 2218 | 0.7711 | 0.7686 | 0.7683 | 0.7596 | 0.7615 | 0.7609 | 0.7552 | 0.7576 | 0.7574 | 0.7508 | 0.7549 | 0.7549 |
| Netsci | 379 | 914 | 0.9225 | 0.9755 | 0.9752 | 0.9198 | 0.9760 | 0.9760 | 0.9138 | 0.9635 | 0.9632 | 0.8987 | 0.9810 | 0.9807 |
| Dublin | 410 | 2765 | 0.7959 | 0.8460 | 0.8462 | 0.7322 | 0.8427 | 0.8430 | 0.6985 | 0.7567 | 0.7567 | 0.7735 | 0.8226 | 0.8231 |
| Cae | 453 | 4596 | 0.7122 | 0.7369 | 0.7348 | 0.7062 | 0.7202 | 0.7231 | 0.6794 | 0.7161 | 0.7190 | 0.6992 | 0.7160 | 0.7163 |
| Unic | 767 | 1255 | 0.6018 | 0.6020 | 0.6020 | 0.6019 | 0.6021 | 0.6023 | 0.6022 | 0.6021 | 0.6020 | 0.6021 | 0.6021 | 0.6020 |
| Scsc | 961 | 1925 | 0.8158 | 0.8160 | 0.8161 | 0.8151 | 0.8159 | 0.8159 | 0.8158 | 0.8159 | 0.8160 | 0.8151 | 0.8162 | 0.8154 |
| 1133 | 5451 | 0.8567 | 0.9725 | 0.9946 | 0.8423 | 0.9617 | 0.9925 | 0.8341 | 0.9407 | 0.9823 | 0.8389 | 0.9470 | 0.9797 | |
| Eroad | 1174 | 1417 | 0.8942 | 0.9186 | 0.9185 | 0.8943 | 0.9146 | 0.9148 | 0.8114 | 0.9177 | 0.9172 | 0.8925 | 0.9136 | 0.9136 |
| Blogs | 1224 | 19025 | 0.8431 | 0.8580 | 0.8581 | 0.8401 | 0.8493 | 0.8494 | 0.8342 | 0.8451 | 0.8452 | 0.8269 | 0.8391 | 0.8390 |
| Air.tra | 1226 | 2615 | 0.8439 | 0.8640 | 0.8643 | 0.8257 | 0.8533 | 0.8537 | 0.8330 | 0.8568 | 0.8571 | 0.8527 | 0.8652 | 0.8652 |
| TAP | 1373 | 6833 | 0.8964 | 0.9924 | 0.9983 | 0.8757 | 0.9906 | 0.9982 | 0.8423 | 0.9902 | 0.9980 | 0.8558 | 0.9814 | 0.9966 |
| Crim | 1380 | 1476 | 0.7959 | 0.8444 | 0.8446 | 0.6986 | 0.7556 | 0.7558 | 0.7870 | 0.8297 | 0.8298 | 0.7738 | 0.8219 | 0.8220 |
| Chic | 1467 | 1298 | 0.6513 | 0.6515 | 0.6517 | 0.6517 | 0.6516 | 0.6516 | 0.6517 | 0.6519 | 0.6518 | 0.6519 | 0.6517 | 0.6518 |
| HP | 1706 | 6207 | 0.9376 | 0.9833 | 0.9831 | 0.8802 | 0.9371 | 0.9373 | 0.9274 | 0.9756 | 0.9760 | 0.9081 | 0.9720 | 0.9723 |
| Bible | 1773 | 16401 | 0.7413 | 0.7551 | 0.7555 | 0.7131 | 0.7253 | 0.7245 | 0.7056 | 0.7303 | 0.7310 | 0.6822 | 0.7005 | 0.7007 |
| HF | 1858 | 12534 | 0.7726 | 0.7768 | 0.7770 | 0.7673 | 0.7714 | 0.7717 | 0.7708 | 0.7737 | 0.7738 | 0.7701 | 0.7737 | 0.7733 |
| Uc.irv | 1899 | 59835 | 0.8996 | 0.9471 | 0.9471 | 0.8973 | 0.9254 | 0.9258 | 0.8952 | 0.9265 | 0.9266 | 0.8956 | 0.9123 | 0.9127 |
| DNC | 2029 | 39264 | 0.9613 | 0.9618 | 0.9619 | 0.9608 | 0.9613 | 0.9614 | 0.9372 | 0.9432 | 0.9437 | 0.9603 | 0.9612 | 0.9611 |
| IUI | 2288 | 4190 | 0.8276 | 0.8277 | 0.8274 | 0.8281 | 0.8268 | 0.8278 | 0.8287 | 0.8278 | 0.8283 | 0.8275 | 0.8271 | 0.8278 |
| PPI | 2375 | 11693 | 0.9349 | 0.9747 | 0.9747 | 0.9336 | 0.9707 | 0.9706 | 0.7811 | 0.9250 | 0.9249 | 0.9314 | 0.9652 | 0.9650 |
| Health | 2539 | 12969 | 0.7561 | 0.8970 | 0.8979 | 0.7502 | 0.8886 | 0.8890 | 0.7896 | 0.9339 | 0.9347 | 0.7444 | 0.8724 | 0.8733 |
| Ama | 2880 | 5037 | 0.6535 | 0.9172 | 0.9178 | 0.6502 | 0.9172 | 0.9172 | 0.6531 | 0.9168 | 0.9160 | 0.6314 | 0.9138 | 0.9140 |
| Faceb | 2888 | 2981 | 0.6789 | 0.7216 | 0.7216 | 0.6557 | 0.6945 | 0.6941 | 0.6813 | 0.7116 | 0.7108 | 0.6373 | 0.6750 | 0.6741 |
| Oflgs | 2939 | 30501 | 0.9384 | 0.9581 | 0.9578 | 0.9311 | 0.9505 | 0.9507 | 0.6979 | 0.7957 | 0.7957 | 0.9239 | 0.9432 | 0.9433 |
| Pgrid | 4941 | 6594 | 0.8927 | 0.8928 | 0.8929 | 0.8922 | 0.8929 | 0.8926 | 0.8916 | 0.8928 | 0.8927 | 0.8917 | 0.8928 | 0.8927 |
| Subelj | 6434 | 150985 | 0.6284 | 0.7016 | 0.7016 | 0.6353 | 0.6745 | 0.6741 | 0.6616 | 0.6916 | 0.6908 | 0.6178 | 0.6550 | 0.6541 |
| Adv | 6541 | 51127 | 0.9104 | 0.9269 | 0.9270 | 0.9099 | 0.9242 | 0.9249 | 0.7914 | 0.8373 | 0.8361 | 0.9050 | 0.9180 | 0.9173 |
| Network | Basic properties | Precision | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N | E | COS | TCOS | TCOS1 | SSI | TSSI | TSSI1 | LHN | TLHN | TLHN1 | HDI | THDI | THDI1 | |
| Zkc | 34 | 78 | 0.207 | 0.235 | 0.236 | 0.209 | 0.237 | 0.230 | 0.193 | 0.214 | 0.221 | 0.209 | 0.224 | 0.230 |
| Highs | 70 | 366 | 0.365 | 0.535 | 0.610 | 0.365 | 0.536 | 0.611 | 0.181 | 0.474 | 0.571 | 0.357 | 0.537 | 0.620 |
| Polbs | 105 | 441 | 0.217 | 0.217 | 0.241 | 0.217 | 0.213 | 0.234 | 0.138 | 0.172 | 0.186 | 0.205 | 0.198 | 0.220 |
| Word | 112 | 425 | 0.313 | 0.568 | 0.624 | 0.307 | 0.548 | 0.610 | 0.241 | 0.320 | 0.390 | 0.297 | 0.489 | 0.562 |
| Hypert | 113 | 20818 | 0.221 | 0.239 | 0.244 | 0.215 | 0.231 | 0.240 | 0.220 | 0.231 | 0.233 | 0.202 | 0.221 | 0.228 |
| Footb | 115 | 1231 | 0.224 | 0.286 | 0.320 | 0.212 | 0.267 | 0.306 | 0.156 | 0.201 | 0.232 | 0.176 | 0.233 | 0.278 |
| LRL | 183 | 2494 | 0.282 | 0.310 | 0.287 | 0.275 | 0.305 | 0.286 | 0.259 | 0.294 | 0.287 | 0.270 | 0.302 | 0.285 |
| Jazz | 198 | 2742 | 0.434 | 0.531 | 0.538 | 0.426 | 0.528 | 0.537 | 0.304 | 0.342 | 0.407 | 0.405 | 0.509 | 0.531 |
| Rhall | 217 | 2672 | 0.189 | 0.462 | 0.565 | 0.181 | 0.340 | 0.457 | 0.189 | 0.379 | 0.506 | 0.185 | 0.408 | 0.520 |
| E. coli | 230 | 695 | 0.323 | 0.584 | 0.652 | 0.348 | 0.558 | 0.626 | 0.278 | 0.303 | 0.354 | 0.317 | 0.514 | 0.579 |
| Phys | 241 | 1098 | 0.315 | 0.460 | 0.464 | 0.162 | 0.275 | 0.353 | 0.309 | 0.439 | 0.468 | 0.284 | 0.427 | 0.453 |
| Neural | 297 | 2359 | 0.178 | 0.327 | 0.376 | 0.152 | 0.312 | 0.369 | 0.124 | 0.243 | 0.295 | 0.142 | 0.271 | 0.335 |
| USAir | 332 | 2126 | 0.561 | 0.555 | 0.541 | 0.554 | 0.549 | 0.536 | 0.219 | 0.234 | 0.339 | 0.526 | 0.532 | 0.533 |
| Slavko | 334 | 2218 | 0.253 | 0.261 | 0.258 | 0.243 | 0.254 | 0.253 | 0.132 | 0.162 | 0.175 | 0.215 | 0.229 | 0.236 |
| Netsci | 379 | 914 | 0.371 | 0.581 | 0.771 | 0.370 | 0.583 | 0.773 | 0.279 | 0.0.517 | 0.628 | 0.359 | 0.579 | 0.788 |
| Dublin | 410 | 2765 | 0.203 | 0.355 | 0.361 | 0.115 | 0.212 | 0.279 | 0.199 | 0.343 | 0.370 | 0.176 | 0.319 | 0.348 |
| Cae | 453 | 4596 | 0.141 | 0.229 | 0.236 | 0.139 | 0.224 | 0.233 | 0.077 | 0.101 | 0.114 | 0.112 | 0.195 | 0.214 |
| Unic | 767 | 1255 | 0.104 | 0.156 | 0.153 | 0.103 | 0.152 | 0.153 | 0.098 | 0.141 | 0.152 | 0.101 | 0.148 | 0.152 |
| Scsc | 961 | 1925 | 0.233 | 0.271 | 0.279 | 0.230 | 0.266 | 0.272 | 0.179 | 0.214 | 0.234 | 0.216 | 0.250 | 0.258 |
| 1133 | 5451 | 0.467 | 0.568 | 0.766 | 0.442 | 0.503 | 0.707 | 0.145 | 0.206 | 0.341 | 0.256 | 0.397 | 0.581 | |
| Eroad | 1174 | 1417 | 0.230 | 0.424 | 0.425 | 0.227 | 0.424 | 0.425 | 0.137 | 0.185 | 0.259 | 0.228 | 0.413 | 0.414 |
| Blogs | 1224 | 19025 | 0.173 | 0.242 | 0.287 | 0.166 | 0.216 | 0.246 | 0.173 | 0.226 | 0.254 | 0.171 | 0.226 | 0.256 |
| Air.tra | 1226 | 2615 | 0.186 | 0.374 | 0.402 | 0.179 | 0.383 | 0.403 | 0.056 | 0.132 | 0.156 | 0.158 | 0.366 | 0.367 |
| TAP | 1373 | 6833 | 0.232 | 0.635 | 0.734 | 0.215 | 0.617 | 0.717 | 0.158 | 0.502 | 0.713 | 0.187 | 0.590 | 0.678 |
| Crim | 1380 | 1476 | 0.034 | 0.227 | 0.228 | 0.041 | 0.228 | 0.228 | 0.077 | 0.229 | 0.287 | 0.041 | 0.228 | 0.231 |
| Chic | 1467 | 1298 | 0.294 | 0.303 | 0.304 | 0.298 | 0.303 | 0.304 | 0.287 | 0.302 | 0.304 | 0.298 | 0.303 | 0.304 |
| HP | 1706 | 6207 | 0.135 | 0.221 | 0.312 | 0.122 | 0.189 | 0.246 | 0.136 | 0.199 | 0.255 | 0.131 | 0.198 | 0.255 |
| Bible | 1773 | 16401 | 0.152 | 0.208 | 0.213 | 0.148 | 0.207 | 0.216 | 0.043 | 0.052 | 0.056 | 0.119 | 0.184 | 0.210 |
| HF | 1858 | 12534 | 0.090 | 0.157 | 0.195 | 0.067 | 0.075 | 0.086 | 0.086 | 0.127 | 0.159 | 0.071 | 0.122 | 0.151 |
| Uc.irv | 1899 | 59835 | 0.176 | 0.370 | 0.354 | 0.113 | 0.257 | 0.324 | 0.171 | 0.374 | 0.371 | 0.155 | 0.339 | 0.342 |
| DNC | 2029 | 39264 | 0.373 | 0.376 | 0.373 | 0.372 | 0.375 | 0.374 | 0.013 | 0.019 | 0.019 | 0.362 | 0.370 | 0.373 |
| IUI | 2288 | 4190 | 0.140 | 0.196 | 0.229 | 0.143 | 0.196 | 0.205 | 0.351 | 0.303 | 0.323 | 0.142 | 0.183 | 0.199 |
| Health | 2539 | 12969 | 0.025 | 0.533 | 0.618 | 0.025 | 0.530 | 0.616 | 0.039 | 0.327 | 0.478 | 0.025 | 0.418 | 0.599 |
| Ama | 2880 | 5037 | 0.008 | 0.015 | 0.015 | 0.009 | 0.013 | 0.014 | 0.005 | 0.014 | 0.014 | 0.007 | 0.013 | 0.014 |
| Faceb | 2888 | 2981 | 0.224 | 0.286 | 0.320 | 0.212 | 0.268 | 0.306 | 0.157 | 0.201 | 0.232 | 0.176 | 0.234 | 0.278 |
| Oflgs | 2939 | 30501 | 0.248 | 0.321 | 0.368 | 0.248 | 0.320 | 0.363 | 0.021 | 0.028 | 0.031 | 0.244 | 0.293 | 0.335 |
| Pgrid | 4941 | 6594 | 0.185 | 0.311 | 0.312 | 0.183 | 0.310 | 0.311 | 0.122 | 0.206 | 0.228 | 0.184 | 0.301 | 0.302 |
| Subelj | 6434 | 150985 | 0.067 | 0.124 | 0.130 | 0.065 | 0.120 | 0.125 | 0.016 | 0.046 | 0.051 | 0.066 | 0.114 | 0.115 |
| Adv | 6541 | 51127 | 0.061 | 0.155 | 0.322 | 0.060 | 0.151 | 0.288 | 0.002 | 0.069 | 0.091 | 0.061 | 0.120 | 0.240 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Mental Health Research Topics
Temporal similarity metrics for latent network reconstruction: The role of time-lag decay
Hao Liao
National Engineering Laboratory for Big Data System Computing Technology
Guangdong Province Key Laboratory of Popular High Performance Computers
College of Computer Science and Software Engineering
Shenzhen University
Shenzhen 518060, PR China
&Ming-Kai Liu
National Engineering Laboratory for Big Data System Computing Technology
Guangdong Province Key Laboratory of Popular High Performance Computers
College of Computer Science and Software Engineering
Shenzhen University
Shenzhen 518060, PR China
&Manuel Sebastian Mariani
Institute of Fundamental and Frontier Sciences
University of Electronic Science and Technology of China
Chengdu 610051, PR China
URPP Social Networks
Universitt Zrich
CH-8050 Switzerland
&Mingyang Zhou
National Engineering Laboratory for Big Data System Computing Technology
Guangdong Province Key Laboratory of Popular High Performance Computers
College of Computer Science and Software Engineering
Shenzhen University
Shenzhen 518060, PR China
&Xingtong Wu
National Engineering Laboratory for Big Data System Computing Technology
Guangdong Province Key Laboratory of Popular High Performance Computers
College of Computer Science and Software Engineering
Shenzhen University
Shenzhen 518060, PR China
Abstract
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similarity metrics commonly used in the link prediction literature, we introduce new node-node temporal similarity metrics. The new metrics take as input the time-series of multiple independent spreading processes, based on the hypothesis that two nodes are more likely to be connected if they were often infected at similar points in time. This hypothesis is implemented by introducing a time-lag function which penalizes distant infection times. We find that the choice of this time-lag strongly affects the metrics’ reconstruction accuracy, depending on the network’s clustering coefficient and we provide an extensive comparative analysis of static and temporal similarity metrics for network reconstruction. Our findings shed new light on the notion of similarity between pairs of nodes in complex networks.
K****eywords Information networks Network reconstruction Temporal similarity Innovation diffusion
1 Introduction
Our understanding of social networks is affected by the fact that, typically, we only have incomplete knowledge about the topology of real networks [16, 5]. Aimed at overcoming this shortcoming, the problem of reconstructing missing links has attracted enormous attention from scholars from diverse fields (see [53] for a recent review on the problem). Existing approaches to the network reconstruction problem include the use of local structural metrics [43, 53, 19], global walk-counting methods [39, 53], stochastic block models [33], fitness-based methods [14], structural perturbation analysis [50, 47], machine-learning techniques [31], among many others. Scholars have aimed to identify missing connections in a wide variety of systems, including protein-protein interaction networks [41], neural networks [10], citation networks [15], and social networks [49, 4].
In parallel, there has been recent interest [59, 63, 28, 83, 68, 48] on a different problem of network reconstruction: if we are only provided with information on the outcome of a dynamical process on an unknown propagation network, can we reconstruct the propagation network? The problem – which has been referred to as latent network reconstruction [59] – can be included in the broader class of problems that aim to reconstruct the properties of a spreading process (for instance, the seed node [9] or the epidemic parameters [59]) from data on observed realizations of the process. The question is fundamentally different from the traditional link prediction problem [53]: while link prediction studies [53] typically assume that only part of the network is hidden and needs to be reconstructed, here we assume that the topology of the propagation network is completely hidden. The reconstruction problem studied here is important as we often deal with datasets where the propagation network is largely unknown: for instance, the owners of an online e-commerce platform might have complete information on the time-series of users’ purchases, but lack information about the social connections between the users which might have affected, to some extent, the observed purchasing patterns.
Existing works have tackled the latent network reconstruction problem from various perspectives. Among the most relevant contributions, Myers et al. [59] addressed the problem through a maximum-likelihood estimation method based on a cascade spreading model, which was further mapped into a convex optimization problem. Gomez-Rodriguez et al. [28] developed a faster maximum-likelihood method based on a cascade propagation model. Shen et al. [68] leveraged compressed sensing theory to map the network reconstruction problem into a convex optimization problem. Such a mapping is non-trivial and model-specific; they solved the problem for the Susceptible-Infected-Susceptible (SIS) and the "contact process" dynamics [68]. The main limitation of these approaches is that they are model-dependent: Different spreading models require the solution of a different set of equations. For example, in the compressed-sensing theory approach, the convex-optimization equations for the SIS and the contact process model differ substantially [68]. Besides, the compressed-sensing approach to network reconstruction can be only applied to sparse networks [68].
On the other hand, other studies [83, 48] have tackled the latent network reconstruction problem by means of simple similarity metrics. With respect to convex optimization [59] and methods based on compressed sensing theory [68], similarity metrics have two main advantages: (1) They do not depend on the specific spreading model considered; (2) Their implementation is faster. Temporal similarity metrics for the latent network reconstruction [48] build on the hypothesis that two nodes are more likely to be connected if independent spreading processes tend to infect them at similar times. A simple way to implement this assumption is to impose, for each pair of nodes that are infected by the same spreading process, a contribution to their similarity in the form of a power-law decreasing function of the time lag between the two infection times [48]. For this reason, we refer to these metrics as temporal similarities with power-law time lag decay.
Here, we develop new temporal similarity indexes based on the hypothesis that two nodes are more likely to be connected if independent spreading processes tend to infect them at two consecutive time steps of the dynamics. We refer to the new metrics as temporal similarities with one-step time lag decay. Based on the power-law and one-step decay functions, for each of the eight classes of structural similarity metrics considered here, we construct two corresponding temporal similarity metrics. We compare their performance in reconstructing the whole propagation network in both synthetic and real data. By analyzing empirical networks, we provide the first systematic performance comparison of temporal similarity metrics based on different classes of structural similarity metrics.
We find that for the Susceptible-Infected-Recovered (SIR) spreading dynamics [64], for almost all the analyzed networks, the temporal similarities with one-step time lag decay outperform the temporal similarities with a power-law time lag decay. The performance gap is substantially larger for spreading processes sufficiently above their critical point. Besides, among all the classes of similarity metrics considered, we find that the temporal similarity metric with one-step time-lag decay based on the Cosine similarity [67] tends to outperform the other metrics; other competitive classes of similarity are the temporal variants of the Sorensen index [69] and the Jaccard similarity [37]. Results for two additional spreading models (Susceptible-Infected, SI, and Linear Threshold Model, LTM) are in qualitative agreement.
Our findings move the first steps toward an extensive benchmarking of methods for the reconstruction of a hidden topology from the available event time-series of a spreading process. Our work sheds new light on the notion of node-similarity based on the outcome of dynamical processes on networks, and it has potential implications for social network analysis that will be outlined in the Discussion section.
2 Results
2.1 Problem statement
We assume that there is a unipartite network (whose adjacency matrix is denoted by ) whose topology is unknown, and our goal is to reconstruct it. Our available information is the time-stamped list of adoptions of multiple items that diffuse through a given spreading process. Entry in this list tells us that node adopted item at time . The adoption processes considered here is ruled by the SIR dynamics [64]: the "adoption" of item corresponds to the "infection" during realization of the SIR dynamics. For this reason, in the following, we will use "adoption" and "infection" interchangeably.
We consider empirical unipartite networks; among these, are information networks (details in the Supplementary Material). We generate the time-series of adoptions by running, for each network, independent realizations of the SIR spreading dynamics initiated by a fraction of initiators (see Methods for details) [48]. Each independent realization of the spreading process is therefore interpreted as an item that gradually diffuses across the network. In fact, the time-series can be interpreted as a temporal bipartite network [34]; we denote by the incidence matrix of the corresponding time-aggregate bipartite network: if node adopted item .
We address the following problem. Assuming that we only know , which is the best method to reconstruct the edges of from ? While, in principle, several techniques of network reconstruction can be designed [53, 68, 48], we narrow our focus to similarity metrics that aim to infer the similarity of two nodes and based on their co-adoption patterns [83, 48]. The definitions of the metrics of interest are provided in Sections 2.2 and 4.1.
Such similarity metrics produce a ranking of the pairs of nodes (potential edges) in descending order of . Assuming that we know the number of edges of the underlying propagation network , the top-ranked links by form the network reconstructed by metric . It is natural to assess the precision of the metric by measuring the fraction of common links between and . This metric is typically referred to as precision in the link prediction [53] and information filtering literature, and we use it to evaluate the reconstruction performance of the similarity metrics. The results for another evaluation metric111Differently from the precision metric, the AUC metric is independent of . (Area Under the Curve, AUC [51]) are in qualitative agreement with those obtained with the precision (Figs. S8).
2.2 From structural to temporal similarity metrics
We consider here eight classes of structural similarities [53]: common neighbors (CN), Jaccard Index (Jac), Leicht-Holme-Newman Index (LHN), Cosine Index (COS), Sorensen Index (SSI), Hub Promoted Index (HPI), Hub Depressed Index (HDI), Preferential Attachment (PA). These structural metrics have been used by researchers from diverse domains to address various problems in network analysis. They have been applied to the reconstruction of missing links in networks where only a part of the topology is available [16, 51], to the prediction of new connections in social and information systems [49], and to the latent network reconstruction problem studied here as well [83].
For each class222X is a placeholder here. E.g., X can represent common neighbors CN. X of similarities, we consider the standard static metric [53] (directly denoted as X), and two temporal similarity metrics: temporal metrics with the power-law time-lag decay (denoted as TX) [48], and the new temporal metrics with the one-step time-lag decay (denoted as TX1). The last two classes of metrics differ in how the similarity score of a given pair of nodes depends on the time lag between node ’s and ’s adoption times and for item . We refer to the Methods section for all the definitions.
To illustrate the main idea behind each class of metrics, we define here the common-neighbors metrics: static common neighbors (CN), temporal common neighbors with a power-law decay of time-lag (TCN), and temporal common neighbors with one-step decay of time lag (TCN1). The common neighbors (CN) of a given pair of nodes is simply given by [53]
[TABLE]
According to this definition, two nodes are similar (and, therefore, more likely to be connected in the hidden unipartite network) if they often adopted the same item.
Zeng [83] found that this metric and similar static metrics can be used to reconstruct the topology of a hidden network based on the time-series of a spreading dynamics. Subsequently, the static metric proved to be sub-optimal with respect to time-aware metrics [48]. Indeed, while it is plausible that two nodes that often adopt the same item at similar times are more likely to be connected, the same is not necessarily true if the common adoptions happen at very distant points in time: given two adopters and , with , item might indeed have reached though a long network path, without the two nodes being directly connected.
To penalize longer time lags, [48] introduced the temporal common neighbors with power-law time-lag decay (TCN) as
[TABLE]
This time-aware metric significantly outperforms its static counterpart, , in the latent network reconstruction [48]. However, as a consequence of the power-law function, the similarity of a given pair of nodes receives substantial non-zero contributions also when the two nodes adopt the same item at substantially different times.
In this work, we introduce the temporal common neighbors with a one-step decay time-lag decay (TCN1) as
[TABLE]
According to this definition, the similarity of a given pair of nodes only receives a contribution when the two nodes adopt the same item at two consecutive time steps.
Analogous definitions for the other seven classes of similarities and their temporal variants with power-law and one-step time-lag decay are provided in the Methods section. The goal of the rest of the paper is to extensively compare the performance of these metrics in reconstructing both synthetic and empirical networks.
2.3 Reconstruction of synthetic networks
We start our investigation from synthetic networks generated with the Barabási-Albert model [6] (see Methods for the generation details). Fig. 1 shows our reconstruction results: each panel refers to a class of similarities; for each class of similarities (e.g., common neighbors), we show the results for the static metric (CN), the temporal metric with power-law time lag decay (TCN), and the new temporal metric with one-step time lag decay (TCN1). The precision values attained by the metrics are shown as a function of the transmission probability of the SIR spreading process.
For each considered structural metric (e.g., CN), for sufficiently large values, the corresponding temporal metric with one-step decay (TCN1) performs significantly better than the corresponding temporal metric with power-law decay (e.g., TCN). As we reduce , spreading processes tend to die out more rapidly, and it becomes increasingly harder to correctly reconstruct the underlying diffusion network; in the small- regime, the temporal metrics with a one-step and power-law decay perform similarly. As expected [48], the time-aware metrics significantly outperform the static metric.
Fig. 2 shows analogous results for a small-world network [79] (see Methods for the generation details). We observe again a systematic performance edge of the temporal metrics with a one-step time lag decay over the temporal metrics with power-law time lag decay, yet this gap is smaller than in the BA networks.
2.4 Reconstruction of real networks
Our results on synthetic networks suggest that the temporal metrics with one-step time lag decay reconstruct synthetic contact networks better than the temporal metrics with power-law time lag decay. To further validate this assertion, we analyzed empirical contact networks of diverse nature including information networks (details in the Supplementary Material).
For almost all the analyzed datasets, the temporal metrics with one-step time lag decay substantially improve the reconstruction accuracy with respect to both static (Fig. 3) and temporal metrics with power-law time lag decay (Fig. 4). The only networks where the temporal metrics with power-law time-lag decay can outperform the temporal metrics with one-step time-lag decay are those with low clustering coefficient333In our work, we use the average local clustering coefficient as a metric for clustering. For each node in the network, we calculate the number of existing edges that connect nodes that are connected with , and the maximum number of possible links between ’s neighbors. For an undirected graph, . Finally, we define ’s local clustering coefficient = , and the network’s clustering coefficient as . . This is intuitive: In a network with lower clustering, it is less likely that two non-connected nodes are reached by long propagation paths. This mitigates the advantage of considering only adoptions with one-step time lag when computing the similarity score of a given pair of nodes.
The results in Fig. 4 were obtained with , where is the epidemic threshold [64]. As expected from the synthetic network analysis, we find that for larger values (Fig. S1), the one-step time lag metrics show better reconstruction accuracy for the vast majority of datasets and considered metrics. On the other hand, for lower values, there is not a clear advantage of the metrics with the one-step time-lag decay (Figs. S2-S3).
So far, we have compared similarities of the same class (e.g., common neighbors) with different time-lag decay functions. A natural question arises: what is the relative performance of the eight temporal metrics TX1 with one-step time-lag decay obtained from the eight different classes of similarities? We compare the eight metrics’ performance across the empirical datasets considered here. We refer to Figs. S4-S5 for the results on individual datasets. To gain a general understanding of the metrics’ performance, we aggregate the metrics’ performance over the analyzed networks. To this end, we consider two evaluation metrics: the metrics’ mean rank [58] and the mean relative precision.
To compute the metrics’ mean rank, for each dataset , we rank the eight TX1 metrics in order of decreasing precision. We denote by the ranking position metric for dataset . Given analyzed empirical networks ( in our work), the mean rank of metric is simply defined as . Better performing metric should exhibit lower mean rank values [58]. In addition, denoting by the precision achieved by metric in dataset , we define the mean relative precision of metric as . Better performing metric should exhibit larger mean relative precision values.
Both evaluation metrics lead to the same overall conclusion (Fig. 5): on average, the TCOS1 (temporal Cosine with one-step time-lag decay) metric
[TABLE]
is the best-performing metric, followed by TSS1 and TJAC1 (see Methods for their definition). While TCOS1 provides us with a computationally fast metric to reconstruct the hidden topology, its mean precision is . This leaves the door open for future performance improvements, possibly based on new similarity metrics or more sophisticated methods.
3 Discussion
Our work provided a systematic benchmarking of temporal similarity metrics with respect to their accuracy in reconstructing a hidden network topology. The reconstruction was more accurate for SIR spreading processes with a large transmission probability, i.e., in the supercritical regime. On both real and synthetic networks, we found that temporal metrics with one-step time-lag decay perform systematically better than metrics with power-law time-lag decay. Besides, we found that the temporal cosine metric with one-step time-lag decay is the best-performing metric. Differently from maximum-likelihood methods [28] and compressed-sensing theory approaches [59], the temporal similarity metrics considered here are general and not restricted to a specific dynamics. In this sense, they can be interpreted not only as parsimonious and effective reconstruction tools, but also as general baselines against which more sophisticated, model-specific reconstruction techniques can be evaluated. While we focused on the SIR dynamics, we also assessed the metrics’ performance for two additional spreading models: the Susceptible-Infected (SI) model [3, 70] and the Linear Threshold Model (LTM) [30, 40, 12]. The results obtained for these two models are in qualitative agreement with the results obtained for the SIR model (Figs. S6-S7), supporting the generality of our conclusions.
Our study paves the way for several extensions. Temporal similarity metrics might be applied to other network reconstruction problems, such as the problem where part of the topology is known [51] and the matching of user accounts across different domains or devices [11, 46]. Even more intriguingly, one can attempt to reconstruct the hidden topology of a social network based on the observed dynamics of real diffusion processes. For instance, from the observed spreading dynamics of many pieces of information, one might attempt to reconstruct propagation networks in social media [65] and e-commerce platforms [56]. The results presented here support metrics based on one-step time lags as the best-performing ones in the latent network reconstruction task. While the time step of the dynamics is unambiguously defined for simulated processes, the same does not hold for real spreading processes. Using temporal similarity metrics to reconstruct propagation topologies based on real time-series data will likely require us to first identify the typical timescale needed for a given piece of information to be transmitted from an individual to another, and then to use this typical timescale as the time-lag parameter in the similarity metric.
Finally, our work contributes to the rich literature on similarity on social and information networks [45, 44, 73, 29, 13, 66]. Previous research has stressed the role of structural similarity metrics, i.e., similarity metrics based either on the time-aggregate contact network of individuals (who is connected to whom) [35, 80, 42] or on the time-aggregate user-item bipartite adoption network (who collected what) [44, 83]. Here, we combined structure and temporal information (who collected what at which time) to define temporal similarity metrics that are effective in the propagation network task. We envision that future research on social and information network analysis might further develop simple yet well-performing time-aware metrics for network reconstruction.
4 Methods
4.1 Temporal similarity metrics
For each class of similarity metrics, we define three metrics: a static metric , a temporal metric with power-law time-lag decay , and a temporal metric with one-step time-lag decay . In our work, we consider eight classes C of similarities: Common Neighbors (CN), Jaccard (Jac), Cosine (COS), Leicht-Holme-Newman (LHN), Sorensen Index (SSI), Hub-promoted Index (HPI), Preferential Attachment (PA), Hub-depressed Index (HDI). As we already defined the three CN similarities in the main text, we define here the metrics based on the seven additional classes.
Jaccard (Jac) similarity
We define three metrics:
- •
Jaccard similarity (Jac):
[TABLE]
- •
Temporal Jaccard similarity with power-law time-lag decay (TJac):
[TABLE]
- •
Temporal Jaccard similarity with one-step time-lag decay (TJac1):
[TABLE]
Cosine (COS) similarity
We define three metrics:
- •
Cosine similarity (COS):
[TABLE]
- •
Temporal Cosine similarity with power-law time-lag decay (TCOS):
[TABLE]
- •
Temporal Cosine similarity with one-step time-lag decay (TCOS1):
[TABLE]
Leicht-Holme-Newman Index (LHN) similarity
We define three metrics:
- •
Leicht-Holme-Newman Index similarity (LHN):
[TABLE]
- •
Temporal Leicht-Holme-Newman Index similarity with power-law time-lag decay (TLHN):
[TABLE]
- •
Temporal Leicht-Holme-Newman Index similarity with one-step time-lag decay (TLHN1):
[TABLE]
Sørensen Index (SSI) similarity
We define three metrics:
- •
Sørensen Index similarity (SSI):
[TABLE]
- •
Temporal Sørensen Index similarity with power-law time-lag decay (TSSI):
[TABLE]
- •
Temporal Sørensen Index similarity with one-step time-lag decay (TSSI1):
[TABLE]
Hub Promoted Index (HPI) similarity
We define three metrics:
- •
Hub Promoted Index similarity (HPI):
[TABLE]
- •
Temporal Hub Promoted Index similarity with power-law time-lag decay (THPI):
[TABLE]
- •
Temporal Hub Promoted Index similarity with one-step time-lag decay (THPI1):
[TABLE]
Hub Depressed Index (HDI) similarity
We define three metrics:
- •
Hub Depressed Index similarity (HDI):
[TABLE]
- •
Temporal Hub Depressed Index similarity with power-law time-lag decay (THDI):
[TABLE]
- •
Temporal Hub Depressed Index similarity with one-step time-lag decay (THDI1):
[TABLE]
Preferential Attachment (PA) similarity
We define three metrics:
- •
Preferential Attachment similarity (PA):
[TABLE]
- •
Temporal Preferential Attachment similarity with power-law time-lag decay (TPA):
[TABLE]
- •
Temporal Preferential Attachment similarity with one-step time-lag decay (TPA1):
[TABLE]
In all the temporal similarity methods above, we set when . Note that in the TC metrics, the factor makes sure that events where do not contribute to the similarity. Indeed, when , is not the node that infected ; therefore, and are unlikely to be connected in the networks. Note that in other problems such as link prediction and recommendation, the case may need to be treated differently.
4.2 SIR spreading dynamics
In the SIR model, each node is in one of the three states: Susceptible (S), Infected (I), Recovered (R). Each node has a probability to be an initiator of the spreading process; therefore, there are simultaneous initiators, on average, for each spreading process. At each time step, each infected node can infect each of its neighbors with probability ; each infected node can recover with probability . For simplicity, we fix (each node recovers one step after having been infected). The process ends when there are no more infected nodes in the system. For each empirical network, we run independent realizations of the SIR dynamics. For each process , we record the temporal list of the nodes infected by that process. The bipartite adjacency matrix records which nodes were infected by which process: if has been infected by , whereas otherwise. If , the time step at which was infected by is recorded in .
4.3 Generation of the synthetic networks
We use two well-known models for the generation of synthetic networks: the Barabási-Albert (BA) model [6], and the Small-World (SW) model [79].
Barabási-Albert (BA)
We generate networks composed of nodes. Our initial condition is a regular network where each node composed of nodes; each initial node has the degree equal to . At each time step , we add a new node to the network. The new node connects with preexisting nodes; the probability that a preexisting node is selected is proportional to its degree at time .
Small-World (SW)
We start from a regular ring lattice composed of nodes and degree : we connect each of the nodes with its nearest neighbors. We rewire each link with probability – in this work, we set . More specifically, for each node , we select a node from its neighbors and we extract a random number from the uniform distribution in . If is larger than , we and remove the edge between node and node , we randomly select a node , and we establish an edge between node and node .
Competing interests
The authors declare that they have no competing interests.
Author’s contribution
The work presented in this paper corresponds to a collaborative development by all authors. Conceptualization, H.L., M.S.M, and M-Y.Z.; Data Curation, M-K.L. and H.L.; Formal Analysis, H.L., M-K.L. and M.S.M.; Funding Acquisition, H.L. and M-Y.Z.; Methodology, M.S.M.; Resources, H.L. and M-Y.Z.; Software, M-K.L. and X-T.W.; Writing—Original Draft, M.S.M., M-K.L., M-Y.Z., X-T.W. and H.L.
Acknowledgements
We wish to thank Prof. Ginestra Bianconi and Prof. Chi Ho Yeung for providing us valuable suggestions. H.L and M.Y.Z acknowledge financial support from the National Natural Science Foundation of China (Grant Nos. 61803266, 61703281), Guangdong Province Natural Science Foundation (Grant Nos. 2016A030310051,2017A030310374, 2017B030314073), Guangdong Pre-national project (Grant Nos. 2014GK
XM054), Shenzhen Fundamental Research Foundation ( JCYJ20160520162743717, JCYJ20150529164-656096), Natural Science Foundation of SZU (Grant No. 2016-24), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China(Grant No. 2015K-QNCX143). MSM acknowledges the University of Zurich for support through the URPP Social Networks.
Supplementary
Data Description
Here we describe the empirical networks analyzed in the main text.
Facebook: a social network which contains Facebook user–user friendships. [55]
- 2)
Jazz: a music collaboration network obtained from the Red Hot Jazz Archive digital database. It includes 198 bands that performed between 1912 and 1940, with most of the bands from 1920 to 1940. [27]
- 3)
Residence hall: a network which contains friendship ratings between 217 residents living at a residence hall located in the Australian National University campus. [23]
- 4)
E.coli: a metabolic network of E.coli. [38]
- 5)
Physicians: a network which captures the spreading paths of an innovation among 246 physicians in for towns in Illinois, Peoria, Bloomington, Quincy and Galesburg. [17]
- 6)
Neural: a neural network in C. elegans. [20]
- 7)
Usair: the US air transportation network that connects airport located in the United States.
- 8)
Dublin: a network which describes the face-to-face behavior of people during the exhibition "infectious: stay away" in 2009 at the Science Gallery in Dublin. [36]
- 9)
Crim: a network which connects persons who appeared in at least one crime case as either a suspect, a victim, a witness or both a suspect and victim at the same time.
- 10)
Caenorhabditis elegans: a metabolic network of the roundworm Caenorhabditis elegans. [21]
- 11)
Email: an email communication network at the University Rovira i Virgili in Tarragona in the south of Catalonia in Spain [32]
- 12)
Blogs: a network which contains front-page hyperlinks between blogs in the context of the 2004 US election. [1]
- 13)
Air traffic control: a network which was constructed from the USA’s FAA (Federal Aviation Administration) National Flight Data Center (NFDC), Preferred Routes Database.
- 14)
Human protein: a network of interactions between proteins in Humans (Homo sapiens), from the first large-scale study of protein–protein interactions in Human cells using a mass spectrometry-based approach. [71]
- 15)
Hamsterster friendships: a network which contains friendships between users of the website hamsterster.com.
- 16)
UC Irvine messages: a network which contains sent messages between the users of an online community of students from the University of California, Irvine. [62]
- 17)
Adolescent health: a network which was created from a survey that took place in 1994/1995. [57]
- 18)
Advogato: a network from an online community platform for developers of free software launched in 1999. [54]
- 19)
Euroroad: an international E-road network, a road network located mostly in Europe. [76]
- 20)
Highschool: a network which contains friendships between boys in a small high school in Illinois. [18]
- 21)
Hypertext: a network of face-to-face contacts between the attendees of the ACM Hypertext 2009 conference. [36]
- 22)
IUI: a network of the collaborations among the authors of papers published in Informatica and Uporabna informatika. [24]
- 23)
Amazon: a network between web pages in amazon.com. [72]
- 24)
SCSC: a network of collaborations between Slovenian computer scientists. [7]
- 25)
Zachary karate club: the well-known Zachary karate club social network. [82]
- 26)
Polbooks: a network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com. [2]
- 27)
Powergrid: the power grid of the Western States of the United States of America. [78]
- 28)
Subelj: the software class dependency network of the JUNG 2.0.1 and javax 1.6.0.7 libraries, namespaces edu.uci.ics.jung and java/javax.
- 29)
PPI: a protein-protein interaction network. [75]
- 30)
Openflights: a network which contains flights between airports of the world. [61]
- 31)
Bible: a network which contains nouns (places and names) of the King James Version of the Bible and information about their co-occurrences. [84]
- 32)
Chicago: a network on the road transportation of the Chicago region (USA). [22, 8]
- 33)
DNC email: a network of emails in the 2016 Democratic National Committee email leak. [81]
- 34)
Word: an adjacency network of common adjectives and nouns in the novel David Copperfield written by Charles Dickens. [60]
- 35)
Football: a network of American football games between Division IA colleges during regular season Fall 2000. [26, 74]
- 36)
Little Rock Lake: the food web of Little Rock Lake, Wisconsin in the United States of America. [52]
- 37)
Unicode: a bipartite network denotes which languages are spoken in which countries. Here we transferred it to a unipartite network. [77]
- 38)
Netsci: a coauthorship network between scientists who published on the topic of network science. [60]
- 39)
TAP: a yeast protein binding network generated by tandem affinity purification experiments. [25]
- 40)
Slavko: a small friendship network from an online website. [7]
SI Tables
SI FIGURES
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Lada A Adamic and Natalie Glance. The Political Blogosphere and the 2004 US Election: Divided they Blog. In Proceedings of the 3rd International Workshop on Link Discovery , pages 36–43. ACM, 2005.
- 2[2] Mohammad Al Hasan and Mohammed J Zaki. A survey of link prediction in social networks. In Social network data analytics , pages 243–275. Springer, 2011.
- 3[3] Roy M Anderson and Robert M May. Infectious diseases of humans: dynamics and control . Oxford university press, Oxford, England, UK, 1992.
- 4[4] Lars Backstrom and Jure Leskovec. Supervised random walks: predicting and recommending links in social networks. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining , pages 635–644. ACM, 2011.
- 5[5] Eytan Bakshy, Itamar Rosenn, Cameron Marlow, and Lada Adamic. The role of social networks in information diffusion. In Proceedings of the 21st International Conference on World Wide Web , pages 519–528. ACM, 2012.
- 6[6] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science , 286(5439):509–512, 1999.
- 7[7] Neli Blagus, Lovro Šubelj, and Marko Bajec. Self-similar scaling of density in complex real-world networks. Physica A , 391(8):2794–2802, 2012.
- 8[8] D. E. Boyce, K. S. Chon, M. E. Ferris, Y. J. Lee, K-T. Lin, and R. W. Eash. Implementation and evaluation of combined models of urban travel and location on a sketch planning network. Chicago Area Transportation Study , pages xii + 169, 1985.
