Non-parametric regression for networks

Katie E. Severn; Ian L. Dryden; Simon P. Preston

arXiv:2010.00050·stat.ME·October 2, 2020

Non-parametric regression for networks

Katie E. Severn, Ian L. Dryden, Simon P. Preston

PDF

TL;DR

This paper develops a non-parametric regression method for network data represented as graph Laplacian matrices, enabling analysis of dynamic networks and trends in various applications.

Contribution

It introduces an adapted Nadaraya-Watson estimator for manifold-valued network data with proven uniform weak consistency.

Findings

01

Successfully modeled trends in Enron email networks

02

Identified anomalous networks in the dataset

03

Explored writing style evolution through word co-occurrence networks

Abstract

Network data are becoming increasingly available, and so there is a need to develop suitable methodology for statistical analysis. Networks can be represented as graph Laplacian matrices, which are a type of manifold-valued data. Our main objective is to estimate a regression curve from a sample of graph Laplacian matrices conditional on a set of Euclidean covariates, for example in dynamic networks where the covariate is time. We develop an adapted Nadaraya-Watson estimator which has uniform weak consistency for estimation using Euclidean and power Euclidean metrics. We apply the methodology to the Enron email corpus to model smooth trends in monthly networks and highlight anomalous networks. Another motivating application is given in corpus linguistics, which explores trends in an author's writing style over time based on word co-occurrence networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.