# Large-scale directed network inference with multivariate transfer   entropy and hierarchical statistical testing

**Authors:** Leonardo Novelli, Patricia Wollstadt, Pedro Mediano, Michael Wibral,, Joseph T. Lizier

arXiv: 1902.06828 · 2019-07-31

## TL;DR

This paper introduces an efficient, hierarchical statistical testing approach for large-scale directed network inference using multivariate transfer entropy, validated on synthetic neuroimaging data with high accuracy and scalability.

## Contribution

The authors present a novel algorithm implemented in IDTxl that controls false positives and enables scalable inference of large directed networks from time series data.

## Key findings

- High precision, recall, and specificity (>98%) on synthetic datasets with 10,000 samples.
- Effective for networks up to 100 nodes with linear and nonlinear dynamics.
- Scalable to larger datasets typical of EEG and MEG studies.

## Abstract

Network inference algorithms are valuable tools for the study of large-scale neuroimaging datasets. Multivariate transfer entropy is well suited for this task, being a model-free measure that captures nonlinear and lagged dependencies between time series to infer a minimal directed network model. Greedy algorithms have been proposed to efficiently deal with high-dimensional datasets while avoiding redundant inferences and capturing synergistic effects. However, multiple statistical comparisons may inflate the false positive rate and are computationally demanding, which limited the size of previous validation studies. The algorithm we present---as implemented in the IDTxl open-source software---addresses these challenges by employing hierarchical statistical tests to control the family-wise error rate and to allow for efficient parallelisation. The method was validated on synthetic datasets involving random networks of increasing size (up to 100 nodes), for both linear and nonlinear dynamics. The performance increased with the length of the time series, reaching consistently high precision, recall, and specificity (>98% on average) for 10000 time samples. Varying the statistical significance threshold showed a more favourable precision-recall trade-off for longer time series. Both the network size and the sample size are one order of magnitude larger than previously demonstrated, showing feasibility for typical EEG and MEG experiments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.06828/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1902.06828/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/1902.06828/full.md

---
Source: https://tomesphere.com/paper/1902.06828