# Manifold valued data analysis of samples of networks, with applications   in corpus linguistics

**Authors:** Katie E. Severn, Ian L. Dryden, Simon P. Preston

arXiv: 1902.08290 · 2020-09-17

## TL;DR

This paper introduces a framework for statistical analysis of network samples, using graph Laplacians to enable mean computation, PCA, regression, and hypothesis testing, with applications to literary networks.

## Contribution

It develops a novel extrinsic statistical framework for network data analysis based on graph Laplacians, applicable to corpus linguistics and other fields.

## Key findings

- Successfully applied to Jane Austen and Dickens novels.
- Enabled hypothesis testing for network mean differences.
- Provided tools for PCA and regression on network data.

## Abstract

Networks arise in many applications, such as in the analysis of text documents, social interactions and brain activity. We develop a general framework for extrinsic statistical analysis of samples of networks, motivated by networks representing text documents in corpus linguistics. We identify networks with their graph Laplacian matrices, for which we define metrics, embeddings, tangent spaces, and a projection from Euclidean space to the space of graph Laplacians. This framework provides a way of computing means, performing principal component analysis and regression, and carrying out hypothesis tests, such as for testing for equality of means between two samples of networks. We apply the methodology to the set of novels by Jane Austen and Charles Dickens.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08290/full.md

## Figures

40 figures with captions in the complete paper: https://tomesphere.com/paper/1902.08290/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1902.08290/full.md

---
Source: https://tomesphere.com/paper/1902.08290