Network Density of States

Kun Dong; Austin R. Benson; David Bindel

arXiv:1905.09758·cs.SI·May 24, 2019

Network Density of States

Kun Dong, Austin R. Benson, David Bindel

PDF

1 Repo

TL;DR

This paper explores the spectral density of real-world graphs using physics-inspired tools, enabling efficient analysis of large graphs and revealing structural insights beyond extremal eigenvalues.

Contribution

It introduces novel methods for analyzing spectral densities of large graphs, bridging spectral geometry and graph theory with physics-inspired techniques.

Findings

01

Efficient computation of spectral densities for graphs with over a billion edges.

02

Spectral densities provide detailed graph fingerprints and structural insights.

03

Estimation of centrality measures from spectral densities enhances graph analysis.

Abstract

Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues - the spectral density - is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to…

Tables2

Table 1. Table 1 . Average time to compute each Chebyshev moment (with 20 probes) for graphs from the SNAP repository.

Network	# Nodes	# Edges	Avg. Deg.	Time (s)
Facebook	4,039	88,234	43.69	0.007
AstroPh	18,772	198,110	21.11	0.028
Enron	36,692	183,831	10.02	0.046
Gplus	107,614	13,673,453	254.12	1.133
Amazon	334,863	925,872	5.53	0.628
Neuron	1,018,524	24,735,503	48.57	9.138
RoadNetCA	1,965,206	2,766,607	2.82	2.276
Orkut	3,072,441	117,185,083	76.28	153.7
LiveJournal	3,997,962	34,681,189	17.35	14.52
Friendster	65,608,366	1,806,067,135	55.06	1,017

Table 2. Table 2 . List of datasets and the corresponding source.

Network	Full Name	Source
Facebook	Facebook Ego Networks	SNAP
Gplus	Google+ Ego Networks	SNAP
Twitter	Twitter Ego Networks	SNAP
LiveJournal	LiveJournal Online Social Network	SNAP
Friendster	Friendster Online Social Network	SNAP
Orkut	Orkut Online Social Network	SNAP
Amazon	Amazon Product Network	SNAP
Enron	Enron Email Communication Network	SNAP
AstroPh	Arxiv Astro Physics Collaboration
	Network	SNAP
HepTh	Arxiv High Energy Physics Theory
	Collaboration Network	SNAP
RoadNetCA	California Road Network	SNAP
AS-733	Autonomous System Network	SNAP
AS-CAIDA	CAIDA Autonomous System Network	SNAP
Neuron	Megascale Cell-Cell Similarity Network	SNAP
Erdös	Erdös Collaboration Network	RODGER
Marvel Chars	Marvel Characters Network	RODGER

Equations72

λ_{2} (\overline{L}) /2 \leq h (G) = S \subset V min \frac{∣{( i , j ) \in E , i \in S , j \in / S }∣}{min ( vol ( S ) , vol ( V \ S ))} \leq 2 λ_{2} (\overline{L}),

λ_{2} (\overline{L}) /2 \leq h (G) = S \subset V min \frac{∣{( i , j ) \in E , i \in S , j \in / S }∣}{min ( vol ( S ) , vol ( V \ S ))} \leq 2 λ_{2} (\overline{L}),

μ (λ) = \frac{1}{N} i = 1 \sum N δ (λ - λ_{i}), \int f (λ) μ (λ) = trace (f (H))

μ (λ) = \frac{1}{N} i = 1 \sum N δ (λ - λ_{i}), \int f (λ) μ (λ) = trace (f (H))

μ (λ; u) = i = 1 \sum N ∣ u^{T} q_{i} ∣^{2} δ (λ - λ_{i}), \int f (λ) μ (λ; u) = u^{T} f (H) u .

μ (λ; u) = i = 1 \sum N ∣ u^{T} q_{i} ∣^{2} δ (λ - λ_{i}), \int f (λ) μ (λ; u) = u^{T} f (H) u .

W_{1}(\mu,\widetilde{\mu})=\sup\Big{\{}\int(\mu(\lambda)-\widetilde{\mu}(\lambda))f(\lambda)d\lambda:\text{Lip}(f)\leq 1\Big{\}}.

W_{1}(\mu,\widetilde{\mu})=\sup\Big{\{}\int(\mu(\lambda)-\widetilde{\mu}(\lambda))f(\lambda)d\lambda:\text{Lip}(f)\leq 1\Big{\}}.

(K_{σ} * μ) (λ) = \int_{R} σ^{- 1} K (\frac{λ - ν}{σ}) μ (ν) d ν

(K_{σ} * μ) (λ) = \int_{R} σ^{- 1} K (\frac{λ - ν}{σ}) μ (ν) d ν

H = \frac{2 H - ( λ _{m a x} ( H ) + λ _{m i n} ( H ))}{λ _{m a x} ( H ) - λ _{m i n} ( H )}

H = \frac{2 H - ( λ _{m a x} ( H ) + λ _{m i n} ( H ))}{λ _{m a x} ( H ) - λ _{m i n} ( H )}

T_{0} (x) = 1, T_{1} (x) = x, T_{m + 1} (x) = 2 x T_{m} (x) - T_{m - 1} (x) .

T_{0} (x) = 1, T_{1} (x) = x, T_{m + 1} (x) = 2 x T_{m} (x) - T_{m - 1} (x) .

\int_{- 1}^{1} w (x) T_{m} (x) T_{n} (x) d x = δ_{mn} .

\int_{- 1}^{1} w (x) T_{m} (x) T_{n} (x) d x = δ_{mn} .

μ (x) = m = 1 \sum \infty d_{m} T_{m}^{*} (λ)

μ (x) = m = 1 \sum \infty d_{m} T_{m}^{*} (λ)

d_{m} = \int_{- 1}^{1} T_{m} (λ) μ (λ) d λ = \frac{1}{N} i = 1 \sum N T_{m} (λ_{i}) = \frac{1}{N} trace (T_{m} (H)),

d_{mk} = \int_{- 1}^{1} T_{m} (λ) μ_{k} (λ) d λ = i = 1 \sum N ∣ q_{i} (k) ∣^{2} T_{m} (λ_{i}) = T_{m} (H)_{k k} .

d_{mk} = \int_{- 1}^{1} T_{m} (λ) μ_{k} (λ) d λ = i = 1 \sum N ∣ q_{i} (k) ∣^{2} T_{m} (λ_{i}) = T_{m} (H)_{k k} .

E [z^{T} H z] = i, j \sum H_{ij} E [z_{i} z_{j}] = trace (H)

E [z^{T} H z] = i, j \sum H_{ij} E [z_{i} z_{j}] = trace (H)

E [z ⊙ H z] = diag (H)

E [z ⊙ H z] = diag (H)

trace (H) = E [z^{T} H z] \approx \frac{1}{N _{z}} j = 1 \sum N_{z} Z_{j}^{T} H Z_{j}

trace (H) = E [z^{T} H z] \approx \frac{1}{N _{z}} j = 1 \sum N_{z} Z_{j}^{T} H Z_{j}

H Z_{M} = Z_{M}^{T} Γ_{M} + r_{M} e_{M}^{T}

H Z_{M} = Z_{M}^{T} Γ_{M} + r_{M} e_{M}^{T}

z^{T} f (H) z = i = 1 \sum N ∣ z^{T} q_{i} ∣^{2} f (λ_{i}) \approx ∥ z ∥^{2} i = 1 \sum M ∣ p_{i 1} ∣^{2} f (τ_{i})

z^{T} f (H) z = i = 1 \sum N ∣ z^{T} q_{i} ∣^{2} f (λ_{i}) \approx ∥ z ∥^{2} i = 1 \sum M ∣ p_{i 1} ∣^{2} f (τ_{i})

∥ z ∥^{2} i = 1 \sum M ∣ p_{i 1} ∣^{2} δ (λ - τ_{i})

∥ z ∥^{2} i = 1 \sum M ∣ p_{i 1} ∣^{2} δ (λ - τ_{i})

E [μ (λ; z)] = i = 1 \sum N δ (λ - λ_{i}) = N μ (λ)

E [μ (λ; z)] = i = 1 \sum N δ (λ - λ_{i}) = N μ (λ)

μ (λ) \approx i = 1 \sum M ∣ p_{i 1} ∣^{2} δ (λ - τ_{i}) .

μ (λ) \approx i = 1 \sum M ∣ p_{i 1} ∣^{2} δ (λ - τ_{i}) .

H = H_{11} 0 H_{13}^{T} 0 H_{22} H_{23}^{T} H_{13} H_{23} H_{33},

H = H_{11} 0 H_{13}^{T} 0 H_{22} H_{23}^{T} H_{13} H_{23} H_{33},

T_{m + 1} (H)_{11} = 2 H_{11} T_{m} (H)_{11} - T_{m - 1} (H)_{11} + 2 H_{13} T_{m} (H)_{31}

T_{m + 1} (H)_{11} = 2 H_{11} T_{m} (H)_{11} - T_{m - 1} (H)_{11} + 2 H_{13} T_{m} (H)_{31}

T_{m + 1} (H) (I_{p}^{(t)}, I_{s}^{(t)}) = 2 H (I_{p}^{(t)}, I_{p}^{(t)}) T_{m} (H) (I_{p}^{(t)}, I_{s}^{(t)})

T_{m + 1} (H) (I_{p}^{(t)}, I_{s}^{(t)}) = 2 H (I_{p}^{(t)}, I_{p}^{(t)}) T_{m} (H) (I_{p}^{(t)}, I_{s}^{(t)})

- T_{m - 1} (H) (I_{p}^{(t)}, I_{s}^{(t)}) + 2 t^{'} \in S_{t} \sum H (I_{p}^{(t)}, I_{s}^{(t^{'})}) T_{m} (H) (I_{s}^{(t^{'})}, I_{s}^{(t)})

f^{M} = m = 0 \sum M J_{m} c_{m} T_{m} (x)

f^{M} = m = 0 \sum M J_{m} c_{m} T_{m} (x)

∣ K_{σ} (ν - λ) - K_{σ}^{M} (ν - λ) ∣ < \frac{6 L}{M σ}

∣ K_{σ} (ν - λ) - K_{σ}^{M} (ν - λ) ∣ < \frac{6 L}{M σ}

\int_{- 1}^{1} f^{M} (λ) μ (λ) d λ = \int_{- 1}^{1} f (λ) μ^{M} (λ) d λ = m = 0 \sum M J_{m} c_{m} d_{m} .

\int_{- 1}^{1} f^{M} (λ) μ (λ) d λ = \int_{- 1}^{1} f (λ) μ^{M} (λ) d λ = m = 0 \sum M J_{m} c_{m} d_{m} .

∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} =

∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} =

\leq

\leq

∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} \leq ∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} + ∥ K_{σ} ∥_{\infty} ∥ μ^{M} - μ^{M} ∥_{1} .

∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} \leq ∥ K_{σ} * (μ - μ^{M}) ∥_{\infty} + ∥ K_{σ} ∥_{\infty} ∥ μ^{M} - μ^{M} ∥_{1} .

μ (λ) = i = 1 \sum N w_{i}^{2} δ (λ - λ_{i})

μ (λ) = i = 1 \sum N w_{i}^{2} δ (λ - λ_{i})

∥ μ^{M} - μ^{M} ∥_{1} \leq i = 1 \sum N ∣1 - w_{i}^{2} ∣.

∥ μ^{M} - μ^{M} ∥_{1} \leq i = 1 \sum N ∣1 - w_{i}^{2} ∣.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kd383/NetworkDOS
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Network Density of States

Kun Dong

Cornell UniversityIthacaNew York14853

[email protected]

,

Austin R. Benson

Cornell UniversityIthacaNew York14853

[email protected]

and

David Bindel

Cornell UniversityIthacaNew York14853

[email protected]

(2019)

Abstract.

Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues — the spectral density — is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret.

In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone.

††journalyear: 2019††copyright: acmlicensed††conference: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 4–8, 2019; Anchorage, AK, USA††booktitle: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USA††price: 15.00††doi: 10.1145/3292500.3330891††isbn: 978-1-4503-6201-6/19/08

1. Introduction

Spectral theory is a powerful analysis tool in graph theory (Cvetković et al., 1998; Chung and Graham, 1997; Chung and Lu, 2006), geometry (Chavel, 1984), and physics (Jackson, 2006). One follows the same steps in each setting:

•

Identify an object of interest, such as a graph or manifold;

•

Associate the object with a matrix or operator, often the generator of a linear dynamical system or the Hessian of a quadratic form over functions on the object; and

•

Connect spectral properties of the matrix or operator to structural properties of the original object.

In each case, the complete spectral decomposition is enough to recover the original object; the interesting results relate structure to partial spectral information.

Many spectral methods use extreme eigenvalues and associated eigenvectors. These are easy to compute by standard methods, and are easy to interpret in terms of the asymptotic behavior of dynamical systems or the solutions to quadratic optimization problems with quadratic constraints. Several network centrality measures, such as PageRank (Page et al., 1999), are expressed via the stationary vectors of transition matrices, and the rate of convergence to stationarity is bounded via the second-largest eigenvalue. In geometry and graph theory, Cheeger’s inequality relates the second-smallest eigenvalue of a Laplacian or Laplace-Beltrami operator to the size of the smallest bisecting cut (Cheeger, 1969; Mohar, 1989); in the graph setting, the associated eigenvector (the Fiedler vector) is the basis for spectral algorithms for graph partitioning (Pothen et al., 1990). Spectral algorithms for graph coordinates and clustering use the first few eigenvectors of a transition matrix or (normalized) adjacency or Laplacian (Belkin and Niyogi, 2001; Ng et al., 2002). For a survey of such approaches in network science, we refer to (Chung and Lu, 2006).

Mark Kac popularized an alternate approach to spectral analysis in an expository article (Kac, 1966) in which he asked whether one can determine the shape of a physical object (Kac used a drum as an example) given the spectrum of the Laplace operator; that is, can one “hear” the shape of a drum? One can ask a similar question in graph theory: can one uniquely determine the structure of a network from the spectrum of the Laplacian or another related matrix? Though the answer is negative in both cases (Gordon et al., 1992; Cvetković et al., 1998), the spectrum is enormously informative even without eigenvector information. Unlike the extreme eigenvalues and vectors, eigenvalues deep in the spectrum are difficult to compute and to interpret, but the overall distribution of eigenvalues — known as the spectral density or density of states — provides valuable structural information. For example, knowing the spectrum of a graph adjacency matrix is equivalent to knowing $\text{trace}(A^{k})$ , the number of closed walks of any given length $k$ . In some cases, one wants local spectral densities in which the eigenvalues also have positive weights associated with a location. Following Kac, this would give us not only the frequencies of a drum, but also amplitudes based on where the drum is struck. In a graph setting, the local spectral density of an adjacency matrix at node $j$ is equivalent to knowing $(A^{k})_{jj}$ , the number of closed walks of any given length $k$ that begin and end at the node.

Unfortunately, the analysis of spectral densities of networks has been limited by a lack of scalable algorithms. While the normalized Laplacian spectra of Erdős-Rényi random graphs have an approximately semicircular distribution (Wigner, 1958), and the spectral distributions for other popular scale-free and small-world random graph models are also known (Farkas et al., 2001), there has been relatively little work on computing spectral densities of large “real-world” networks. Obtaining the full eigendecomposition is $\mathcal{O}(N^{3})$ for a graph with $N$ nodes, which is prohibitive for graphs of more than a few thousand nodes. In prior work, researchers have employed methods, such as thick-restart Lanczos, that still do not scale to very large graphs (Farkas et al., 2001), or heuristic approximations with no convergence analysis (Banerjee, 2008). It is only recently that clever computational methods were developed simply to test for hypothesized power laws in the spectra of large real-world matrices by computing only part of the spectrum (Eikmeier and Gleich, 2017).

In this paper, we show how methods used to study densities of states in condensed matter physics (Weiße et al., 2006) can be used to study spectral densities in networks. We study these methods for both the global density of states and for local densities of states weighted by specific eigenvector components. We adapt these methods to take advantage of graph-specific structure not present in most physical systems, and analyze the stability of the spectral density to perturbations as well as the convergence of our computational methods. Our methods are remarkably efficient, as we illustrate by computing densities for graphs with billions of edges and tens of millions of nodes on a single cloud compute node. We use our methods for computing these densities to create compelling visual fingerprints that summarize a graph. We also show how the density of states reveals graph properties that are not evident from the extremal eigenvalues and eigenvectors alone, and use it as a tool for fast computation of standard measures of graph connectivity and node centrality. This opens the door for the use of complete spectral information as a tool in large-scale network analysis.

2. Background

2.1. Graph Operators and Eigenvalues

We consider weighted, undirected graphs $G=(V,E)$ with vertices $V=\{v_{1},\cdots,v_{N}\}$ and edges $E\subseteq V\times V$ . The weighted adjacency matrix $A\in\mathbb{R}^{N\times N}$ has entries $a_{ij}>0$ to give the weight of an edge $(i,j)\in E$ and $a_{ij}=0$ otherwise. The degree matrix $D\in\mathbb{R}^{N\times N}$ is the diagonal matrix of weighted node degrees, i.e. $D_{ii}=\sum_{j}a_{ij}$ . Several of the matrices in spectral graph theory are defined in terms of $D$ and $A$ . We describe a few of these below, along with their connections to other research areas. For each operator, we let $\lambda_{1}\leq\ldots\leq\lambda_{N}$ denotes the eigenvalues in ascending order.

Adjacency Matrix: $\boldsymbol{A}$ . Many studies on the spectrum of $A$ originate from random matrix theory where $A$ represents a random graph model. In these cases, the limiting behavior of eigenvalues as $N\to\infty$ is of particular interest. Besides the growth of extremal eigenvalues (Chung and Graham, 1997), Wigner’s semicircular law is the most renowned result about the spectral distribution of the adjacency matrix (Wigner, 1958). When the edges are i.i.d. random variables with bounded moments, the density of eigenvalues within a range converges to a semicircular distribution. One famous graph model of this type is the Erdős-Rényi graph, where $a_{ij}=a_{ji}=1$ with probability $p<1$ , and [math] with probability $1-p$ . Farkas et al. (Farkas et al., 2001) has extended the semicircular law by investigating the spectrum of scale-free and small-world random graph models. They show the spectra of these random graph models relate to geometric characteristics such as the number of cycles and the degree distribution.

Laplacian Matrix: $\boldsymbol{L=D-A}$ . The Laplace operator arises naturally from the study of dynamics in both spectral geometry and spectral graph theory. The continuous Laplace operator and its generalizations are central to the description of physical systems including heat diffusion (McKean, 1972), wave propagation (Lévy, 2006), and quantum mechanics (Ducastelle and Cyrot-Lackmann, 1970). It has infinitely many non-negative eigenvalues, and Weyl’s law (Weyl, 1911) relates their asymptotic distribution to the volume and dimension of the manifold. On the other hand, the discrete Laplace matrix appears in the formulation of graph partitioning problems. If $f\in\{\pm 1\}^{N}$ is an indicator vector for a partition $V=V_{+}\cup V_{-}$ , then $f^{T}Lf/4$ is the number of edges between $V_{+}$ and $V_{-}$ , also known as the cut size. $L$ is a positive-semidefinite matrix with the vector of all ones as a null vector. The eigenvalue $\lambda_{2}$ , called the algebraic connectivity, bounds from below the smallest bisecting cut size; $\lambda_{2}=0$ if and only if the graph is disconnected. In addition, eigenvalues of $L$ also appear in bounds for vertex connectivity ( $\lambda_{2}$ ) (Cvetkovic et al., 2009), minimal bisection ( $\lambda_{2}$ ) (Donath and Hoffman, 2003), and maximum cut ( $\lambda_{N}$ ) (Trevisan, 2012).

Normalized Laplacian Matrix: $\boldsymbol{\overline{L}=I-D^{-1/2}AD^{-1/2}}$ . We will also mention the normalized adjacency matrix $\overline{A}=D^{-1/2}AD^{-1/2}$ and graph random walk matrix $P=D^{-1}A$ here, because these matrices have the same eigenvalues as $\bar{L}$ up to a shift. The connection to some of the most influential results in spectral geometry is established in terms of eigenvalues and eigenvectors of normalized Laplacian. A prominent example is the extension of Cheeger’s inequality to the discrete case, which relates the set of smallest conductance $h(G)$ (the Cheeger constant) to the second smallest eigenvalue of the normalized Laplacian, $\lambda_{2}(\overline{L})$ (Montenegro et al., 2006):

[TABLE]

where $\text{vol}(T)=\sum_{i\in T}\sum_{j=1}^{N}a_{ij}$ . Cheeger’s inequality offers crucial insights and powerful techniques for understanding popular spectral graph algorithms for partitioning (McSherry, 2001) and clustering (Ng et al., 2002). It also plays a key role in analyzing the mixing time of Markov chains and random walks on a graph (Mihail, 1989; Sinclair and Jerrum, 1989). For all these problems, extremal eigenvalues again emerge from relevant optimization formulations.

2.2. Spectral Density (Density of States — DOS)

Let $H=\mathbb{R}^{N\times N}$ be any symmetric graph matrix with an eigendecomposition $H=Q\Lambda Q^{T}$ , where $\Lambda=\text{diag}(\lambda_{1},\cdots,\lambda_{N})$ and $Q=[q_{1},\cdots,q_{N}]$ is orthogonal. The spectral density induced by $H$ is the generalized function

[TABLE]

where $\delta$ is the Dirac delta function and $f$ is any analytic test function. The spectral density $\mu$ is also referred to as the density of states (DOS) in the condensed matter physics literature (Weiße et al., 2006), as it describes the number of states at different energy levels. For any vector $u\in\mathbb{R}^{N}$ , the local density of states (LDOS) is

[TABLE]

Most of the time, we are interested in the case $u=e_{k}$ where $e_{k}$ is the $k$ th standard basis vector—this provides the spectral information about a particular node. We will write $\mu_{k}(\lambda)=\mu(\lambda;e_{k})$ for the pointwise density of states (PDOS) for node $v_{k}$ . It is noteworthy $|e_{k}^{T}q_{i}|=|q_{i}(k)|$ gives the magnitude of the weight for $v_{k}$ in the $i$ -th eigenvector, thereby the set of $\{\mu_{k}\}$ encodes the entire spectral information of the graph up to sign differences. These concepts can be easily extended to directed graphs with asymmetric matrices, for which the eigenvalues are replaced by singular values, and eigenvectors by left/right singular vectors.

Naively, to obtain the DOS and LDOS requires computing all eigenvalues and eigenvectors for an $N$ -by- $N$ matrix, which is infeasible for large graphs. Therefore, we turn to algorithms that approximate these densities. Since the DOS is a generalized function, it is important we specify how the estimation is evaluated. One choice is to treat $\mu$ (or $\mu_{k}$ ) as a distribution, and measure its approximation error with respect to a chosen function space $\mathcal{L}$ . For example, when $\mathcal{L}$ is the set of Lipschitz continuous functions taking the value 0 at 0, the error for estimated $\widetilde{\mu}$ is in the Wasserstein distance (a.k.a. earth-mover distance) (Kantorovich and Rubinstein, 1958)

[TABLE]

This notion is particularly useful when $\mu$ is integrated against in applications such as computing centrality measures.

On the other hand, we can regularize $\mu$ with a mollifier $K_{\sigma}$ (i.e., a smooth approximation of the identity function):

[TABLE]

A simplified approach is numerically integrating $\mu$ over small intervals of equal size to generate a spectral histogram. The advantage is the error is now easily measured and visualized in the $L_{\infty}$ norm. For example, Figure 1 shows the exact and approximated spectral histogram for the normalized adjacency matrix of an Internet topology.

3. Methods

The density of states plays a significant role in understanding electronic band structure in solid state physics, and so several methods have been proposed in that literature to estimate spectral densities. We review two such methods: the kernel polynomial method (KPM) which involves a polynomial expansion of the DOS/LDOS, and the Gauss Quadrature via Lanczos iteration (GQL). These methods have not previously been applied in the network setting, though Cohen-Steiner et al. (Cohen-Steiner et al., 2018) have independently invented an approach similar to KPM for the global DOS alone, albeit using a less numerically stable polynomial basis (the power basis associated with random walks). We then introduce a new direct nested dissection method for LDOS, as well as new graph-specific modifications to improve the convergence of the KPM and GQL approaches.

Throughout this section, $H$ denotes any symmetric matrix.

3.1. Kernel Polynomial Method (KPM)

The Kernel Polynomial Method (KPM) (Weiße et al., 2006) approximates the spectral density through an expansion in the dual basis of an orthogonal polynomial basis. Traditionally, the Chebyshev basis $\{T_{m}\}$ is used because of its connection to the best polynomial interpolation. Chebyshev approximation requires the spectrum to be supported on the interval $[-1,1]$ for numerical stability. However, this condition can be satisfied by any graph matrix after shifting and rescaling:

[TABLE]

We can compute these extremal eigenvalues efficiently for our sparse matrix $H$ , so the pre-computation is not an issue (Parlett, 1984).

The Chebyshev polynomials $T_{m}(x)$ satisfy the recurrence

[TABLE]

They are orthogonal with respect to $w(x)=2/[(1+\delta_{0n})\pi\sqrt{1-x^{2}}]$ :

[TABLE]

(Here and elsewhere, $\delta_{ij}$ is the Kronecker delta: $1$ if $i=j$ and [math] otherwise.) Therefore, $T_{m}^{\ast}(x)=w(x)T_{m}(x)$ also forms the dual Chebyshev basis. Using (7), we can expand our DOS $\mu(\lambda)$ as

[TABLE]

Here, $T_{m}(H)$ is the $m$ th Chebyshev polynomial of the matrix $H$ . The last equality comes from the spectral mapping theorem, which says that taking a polynomial of $H$ maps the eigenvalues by the same polynomial. Similarly, we express the PDOS $\mu_{k}(\lambda)$ as

[TABLE]

We want to efficiently extract the diagonal elements of the matrices $\{T_{m}(H)\}$ without forming them explicitly; the key idea is to apply the stochastic trace/diagonal estimation, proposed by Hutchinson (Hutchinson, 1990) and Bekas et al. (Bekas et al., 2007). Given a random probe vector $z$ such that $z_{i}$ ’s are i.i.d. with mean [math] and variance $1$ ,

[TABLE]

where $\odot$ represents the Hadamard (elementwise) product. Choosing $N_{z}$ independent probe vectors $Z_{j}$ , we obtain the unbiased estimator

[TABLE]

and similarly for the diagonal. Avron and Toledo (Avron and Toledo, 2011) review many possible choices of probes for eqs. 11 and 12; a common choice is vectors with independent standard normal entries. Using the Chebyshev recurrence (eq. 6), we can compute the sequence $T_{j}(H)z$ for each probe at a cost of one matrix-vector product per term, for a total cost of $O(|E|N_{z})$ time per moment $T_{m}(H)$ .

In practice, we only use a finite number of moments rather than an infinite expansion. The number of moments required depends on the convergence rate of the Chebyshev approximation for the class of functions DOS/LDOS is integrated with. For example, the approximation error decays exponentially for test functions that are smooth over the spectrum (Trefethen, 2013a), so only a few moments are needed. On the other hand, such truncation leads to Gibbs oscillations that cause error in the interpolation (Trefethen, 2013b). However, to a large extent, we can use smoothing techniques such as Jackson damping to resolve this issue (Jackson, 1911) (we will formalize this in theorem 4.1).

3.2. Gauss Quadrature and Lanczos (GQL)

Golub and Meurant developed the well-known Gauss Quadrature and Lanczos (GQL) algorithm to approximate bilinear forms for smooth functions of a matrix (Golub and Meurant, 1997). Using the same stochastic estimation from §3.1, we can also apply GQL to compute DOS.

For a starting vector $z$ and graph matrix $H$ , Lanczos iterations after $M$ steps produce a decomposition

[TABLE]

where $Z_{M}^{T}Z_{M}=I_{M}$ , $Z_{M}^{T}r_{M}=0$ , and $\Gamma_{M}$ tridiagonal. GQL approximates $z^{T}f(H)z$ with $\|z\|^{2}e_{1}^{T}f(T_{M})e_{1}$ , implying

[TABLE]

where $(\tau_{1},p_{1})\cdots,(\tau_{M},p_{M})$ are the eigenpairs of $\Gamma_{M}$ . Consequently,

[TABLE]

approximates the LDOS $\mu(\lambda;z)$ .

Building upon the stochastic estimation idea and the invariance of probe vectors under orthogonal transformation, we have

[TABLE]

Hence

[TABLE]

The approximate generalized function is exact when applied to polynomials of degree $\leq 2M+1$ . Furthermore, if we let $z=e_{k}$ then GQL also provides an estimation for the PDOS $\mu_{k}(\lambda)$ . Estimation from GQL can also be converted to Chebyshev moments if needed.

3.3. Nested Dissection (ND)

The estimation error via Monte Carlo method intrinsically decays at the rate $\mathcal{O}(1/\sqrt{N_{z}})$ , where $N_{z}$ is the number of random probing vectors. Hence, we have to tolerate the higher variance when increasing the number of probe vectors becomes too expensive. This is particularly problematic when we try to compute the PDOS for all nodes using the stochastic diagonal estimator. Therefore, we introduce an alternative divide-and-conquer method, which computes more accurate PDOS for any set of nodes at a cost comparable to the stochastic approach in practice.

Suppose the graph can be partitioned into two subgraphs by removal of a small vertex separator. Permuting the vertices so that the two partitions appear first, followed by the separator vertices. Up to vertex permutations, we can rewrite $H$ in block form as

[TABLE]

where the indices indicate the groups identities. Leveraging this structure, we can update the recurrence relation for Chebyshev polynomials to become

[TABLE]

Recursing on the partitioning will lead to a nested dissection, after which we will use direct computation on sufficiently small sub-blocks. We denote the indexing of each partition with $I^{(t)}_{p}=I^{(t)}_{s}\bigcup I^{(t)}_{\ell}\bigcup I^{(t)}_{r}$ , which represents all nodes in the current partition, the separators, and two sub-partitions, respectively. For the separators, equation 13 leads to

[TABLE]

where $S_{t}$ is the path from partition $t$ to the root; and for the leaf blocks, $I^{(t)}_{s}=I^{(t)}_{p}$ in equation 3.3. The result is Algorithm 1.

The multilevel nested dissection process itself has a well-established algorithm by Karypis and Kumar, and efficient implementation is available in METIS (Karypis and Kumar, 1998). Note that this approach is only viable when the graph can be partitioned with a separator of small size. Empirically, we observe this assumption to hold for many real-world networks. The biggest advantage of this approach is we can very efficiently obtain PDOS estimation for a subset of nodes with much better accuracy than KPM.

3.4. Motif Filtering

In many graphs, there are large spikes around particular eigenvalues; for example, see fig. 1. This phenomenon affects the accuracy of DOS estimation in two ways. First, the singularity-like behavior means we need many more moments to obtain a good approximation in polynomial basis. Secondly, due to the equi-oscillation property of Chebyshev approximation, error in irregularities (say, at a point of high concentration in the spectral density), spreads to other parts of the spectrum. This is a problem in our case, as the spectral density of real-world networks are far from uniform.

High multiplicity eigenvalues are typically related to local symmetries in a graph. The most prevalent example is two dangling nodes attached to the same neighbor as shown in 2(a), which accounts for most eigenvalues around [math] for (normalized) adjacency matrix with a localized eigenvector taking value $+1$ on one node and $-1$ on the other. In addition, we list a few more motifs in figure 2 that appear most frequently in real-world graphs. All of them can be associated with specific eigenvalues, and we include the corresponding ones in normalized adjacency matrix for our example.

To detect these motifs in large graphs, we deploy a randomized hashing technique. Given a random vector $z$ , the hashing weight $w=Hz$ encodes all the neighborhood information of each node. To find node copies (left in Figure 2(a)), we seek pairs $(i,j)$ such that $w_{i}=w_{j}$ ; with high probability, this only happens when $v_{i}$ and $v_{j}$ share the same neighbors. Similarly, all motifs in Figure 2 can be characterized by union and intersection of neighborhood lists.

After identifying motifs, we need only approximate the (relatively smooth) density of the remaining spectrum. The eigenvectors associated with these remaining non-motif eigenvalues must be constant across cycles in the canonical decomposition of the associated permutations. Let $P\in\mathbb{R}^{N\times r}$ denote an orthonormal basis for the space of such vectors formed from columns of the identity and (normalized) indicators for nodes cyclically permuted by the motif. The matrix $H_{r}=P^{T}HP$ then has identical eigenvalues to $H$ , except with all the motif eigenvalues omitted. We may form $H_{r}$ explicitly, as it has the same sparsity structure as $H$ but with a supernode replacing the nodes in each instance of a motif cycle; or we can achieve the same result by replacing each random probe $Z$ with the projected probe $Z_{r}=PP^{T}Z$ at an additional cost of $O(N_{\mathrm{motif}})$ per probe, where $N_{\mathrm{motif}}$ is the number of nodes involved in motifs.

The motif filtering method essentially allow us to isolate the spiky components from the spectrum. As a result, we are able to obtain a more accurate approximation using fewer Chebyshev moments. Figure 3 demonstrates the improvement on the approximation as we procedurally filter out motifs at [math], $-1/3$ , $-1/2$ , and $-1/4$ . The eigenvalue $-1/m$ can be generated by an edge attached to the graph through $m-1$ nodes, similar to motif (2(c)).

4. Error Analysis

4.1. KPM Approximation Error

This section provides an error bound for our regularized DOS approximation $K_{\sigma}\ast\mu$ . We will start with the following theorem.

Theorem 4.1 (Jackson’s Theorem (Jackson, 1911)).

If $f:[-1,1]\rightarrow\mathbb{R}$ is Lipschitz continuous with constant $L$ , its best degree $M$ polynomial approximation $\widehat{f}^{M}$ has an $L_{\infty}$ error of at most $6L/M$ . The approximation can be constructed as

[TABLE]

where $J_{m}$ are Jackson smoothing factors and $c_{m}$ are the Chebyshev coefficients.

We can pick a smooth mollifier $K$ with $\text{Lip}(K)=1$ . For any $\nu\in\mathbb{R}$ and $\lambda\in[-1,1]$ there exists a degree $M$ polynomial such that

[TABLE]

Define $\widehat{\mu}^{M}=\sum_{m=0}^{M}J_{m}d_{m}\phi_{m}$ to be the truncated DOS series,

[TABLE]

Therefore,

[TABLE]

Consider $\widetilde{\mu}^{M}$ to be the degree $M$ approximation from KPM,

[TABLE]

If we use a probe $z$ with independent standard normal entries for the trace estimation,

[TABLE]

where $w=Q^{T}z$ is the weight for $z$ in the eigenbasis. Hence

[TABLE]

Finally,

[TABLE]

If we take $N_{z}$ independent probe vectors, then $N_{z}w_{1}^{2}\sim\chi^{2}(N_{z})$ , which means the expectation decays asymptotically like $\sqrt{2/(\pi N_{z})}$ .

4.2. Perturbation Analysis

In this section, we limit our attention to symmetric graph matrix $H$ . Extracting graph information using DOS, whether as a distribution for functions on a graph or as a direct feature in the form of spectral moments, requires stability under small perturbations. In the case of removing/adding a few number of nodes/edges, the Cauchy Interlacing Theorem (Magnus and Neudecker, 1988) gives a bound on each individual new eigenvalue by the old ones. For example, if we remove $r\ll N$ nodes to get a new graph matrix $\widetilde{H}$ , then

[TABLE]

However, this bound may not be helpful when the impact of the change is not reflected by its size. Hence, we provide a theorem that relates the Wasserstein distance (see equation 3) change and the Frobenius norm of the perturbation. Without loss of generality, we assume the eigenvalues of $H$ lie in $[-1,1]$ already.

Theorem 4.2.

Suppose $\widetilde{H}=H+\delta H$ is the perturbed graph matrix with spectral density $\widetilde{\mu}$ , then

[TABLE]

Proof.

Let $\mathcal{L}$ be the space of Lipschitz functions with $f(0)=0$ .

[TABLE]

By Theorem 3.8 from Higham (Higham, 2008), the perturbation on $f(H)$ is bounded by the Fréchet derivative,

[TABLE]

∎

5. Experiments

5.1. Gallery of DOS/PDOS

We first present our spectral histogram approximation from DOS/ PDOS on a wide variety of graphs, including collaboration networks, online social networks, road networks and autonomous systems (dataset details are in the appendix). For all examples, we apply our methods to the normalized adjacency matrices using $500$ Chebyshev moments and $20$ Hadamard probe vectors. Afterwards, the spectral density is integrated into $50$ histogram bins. In figure 4, the DOS approximation is on the first row, and the PDOS approximation is on the second. When a spike exists in the spectrum, we apply motif filtering, and DOS is zoomed appropriately to show the remaining part. For PDOS, we stack the spectral histograms for all nodes vertically, sorted by their projected weights on the leading left singular vector. Red indicates that a node has high weight at certain parts of the spectrum, whereas blue indicates low weight.

We observe many distinct shapes of spectrum in our examples. The eigenvalues of denser graphs, such as the Marvel characters network (4(c): average degree $52.16$ ) and Facebook union of ego networks (4(d): average degree $43.69$ ), exhibit decay similar to thepower-law around $\lambda=0$ . There has been study on the power-law distribution in the eigenvalues of the adjacency and the Laplacian matrix, but it only focuses on the leading eigenvalues rather than the entire spectrum (Eikmeier and Gleich, 2017) for large real-world datasets. Relatively sparse graphs (4(a): average degree $3.06$ ,; 4(b): average degree $4.13$ ) often possess spikes, especially around $\lambda=0$ , which reflect a larger set of loosely-connected boundary nodes. It is much more evident in the PDOS spectral histograms, which allow us to pick out the nodes with dominant weights at $\lambda=0$ and those that contribute most to local structures. Finally, though the road network is quite sparse (ave. deg $2.50$ ), its regularity results in a lack of special features, and most nodes contribute evenly to the spectrum according to PDOS.

5.2. Computation time

In this experiment, we show the scaling of our methods by applying them to graphs of varying size of nodes, edges, and sparsity patterns. Rather than computation power, the memory cost of loading a graph with 100M-1B edges is more often the constraint. Hence, we report runtimes for a Python version on a Google Cloud instance with 200GB memory and an Intel Xeon E5 v3 CPU at 2.30GHz.

The datasets we use are obtained from the SNAP repository (Leskovec and Krevl, 2014). For each graph, we compute the first $10$ Chebyshev moments using KPM with $20$ probe vectors. Most importantly, the cost for each moment is independent of the total number of moments we compute. Table 1 reports number of nodes, number of edges, average degree of nodes, and the average runtime for computing each moment. We can observe that the runtime is in accordance with the theoretical complexity $O(N_{z}(|V|+|E|))$ . For the Friendster social network with about 1.8 billion edges, computing each moment takes about 1000 seconds to compute, which means we could obtain a rough approximation to its spectrum within a day. As the dominant cost is matrix-matrix multiplication and we use several probe vectors, our approach has ample opportunity for parallel computation.

5.3. Model Verification

In this experiment, we investigate the spectrum for some of the popular graph models, and whether they resemble the behavior of real-world data. Two of the most popular models used to describe real-world graphs are the scale-free model (Barabási and Albert, 1999) and the small-world model (Watts and Strogatz, 1998). Farkas et al. (Farkas et al., 2001) has analyzed the spectrum of the adjacency matrix; we instead consider the normalized adjacency.

The scale-free model grows a random graph with the preferential attachment process, starting from an initial seed graph and adding one node and $m$ edges at every step. Figure 5 shows spectral histograms for this model with $5000$ nodes and different choices of $m$ . When $m=1$ , the generated graph has abundant local motifs like many sparse real-world graphs. By searching in PDOS for the nodes that have high weight at the two spikes, we find node-doubles ( $\lambda=0)$ and singly-attached chains ( $\lambda=\pm 1/\sqrt{2}$ ). When $m=5$ , the graph is denser, without any particular motifs, resulting in an approximately semicircular spectral distribution.

The small-world model generates a random graph by re-wiring edges of a ring lattice with a certain probability $p$ . Here we construct these graphs on $5000$ nodes with $p=0.5$ ; the pattern in spectrum is insensitive for a wide range of $p$ . In Figure 6, when the graph is sparse with $5000$ edges, the spectrum has spikes at [math] and $\pm 1$ , indicating local symmetries, bipartite structure, and disconnected components. With $50000$ edges, localized structures disappear and the spectrum has narrower support.

Finally, we investigate the Block Two-Level Erdös-Rényi (BTER) model (Seshadhri et al., 2012), which directly fits an input graph. BTER constructs a similar graph by a two-step process: first create a collection of Erdös-Rényi subgraphs, then interconnect those using a Chung-Lu model (Chung and Lu, 2002). Seshadhri et al. showed their model accurately captures the observable properties of the given graph, including the eigenvalues of the adjacency matrix. Figure 7 compares the DOS/PDOS of the Erdös collaboration network and its BTER counterpart. Unlike the original graph, most [math] eigenvalues in BTER graph come from isolated nodes. The BTER graph also has many more isolated edges ( $\lambda=\pm 1$ ), singly-attached chains ( $\lambda=\pm 1/\sqrt{2})$ ), and singly-attached triangles ( $\lambda=-1/2$ ). We locate these motifs by inspecting nodes with high weights at respective part of the spectrum.

6. Discussion

In this paper, we make the computation of spectral densities a practical tool for the analysis of large real-world network. Our approach borrows from methods in solid state physics, but with adaptations that improve performance in the network analysis setting by special handling of graph motifs that leave distinctive spectral fingerprints. We show that the spectral densities are stable to small changes in the graph, as well as providing an analysis of the approximation error in our methods. We illustrate the efficiency of our approach by treating graphs with tens of millions of nodes and billions of edges using only a single compute node. The method provides a compelling visual fingerprint of a graph, and we show how this fingerprint can be used for tasks such as model verification.

Our approach opens the door for the use of complete spectral information in large-scale network analysis. It provides a framework for scalable computation of quantities already used in network science, such as common centrality measures and graph connectivity indices (such as the Estrada index) that can be expressed in terms of the diagonals and traces of matrix functions. But we expect it to serve more generally to define new families of features that describe graphs and the roles nodes play within those graphs. We have shown that graphs from different backgrounds demonstrate distinct spectral characteristics, and thus can be clustered based on those. Looking at LDOS across nodes for role discovery, we can identify the ones with high similarity in their local structures. Moreover, extracting nodes with large weights at various points of the spectrum uncovers motifs and symmetries. In the future, we expect to use DOS/LDOS as graph features for applications in graph clustering, graph matching, role classification, and other tasks.

Acknowledgments. We thank NSF DMS-1620038 for supporting this work.

Appendix A Data Source

The datasets used in this paper mainly come from the SNAP (Leskovec and Krevl, 2014) and RODGER repositories (Gleich, 2016). Table 2 is a list of the networks from these two sources.

In addition, we used the Minnesota Road Network from the SuiteSparse Matrix Collection (Davis et al., 2014), and the Harry Potter Characters Network from an open source repository (dpmartin42, 2014).

Appendix B Code Release

Code for reproducible experiments and figures are available at https://github.com/kd383/NetworkDOS.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Avron and Toledo (2011) Haim Avron and Sivan Toledo. 2011. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM (JACM) 58, 2 (2011), 8.
3Banerjee (2008) Anirban Banerjee. 2008. The spectrum of the graph Laplacian as a tool for analyzing structure and evolution of networks . Ph.D. Dissertation.
4Barabási and Albert (1999) Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. science 286, 5439 (1999), 509–512.
5Bekas et al . (2007) Costas Bekas, Effrosyni Kokiopoulou, and Yousef Saad. 2007. An estimator for the diagonal of a matrix. Applied Numerical Mathematics 57, 11-12 (2007), 1214–1229.
6Belkin and Niyogi (2001) Mikhail Belkin and Partha Niyogi. 2001. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01) . MIT Press, Cambridge, MA, USA, 585–591.
7Chavel (1984) Isaac Chavel. 1984. Eigenvalues in Riemannian geometry . Vol. 115. Academic press.
8Cheeger (1969) Jeff Cheeger. 1969. A lower bound for the smallest eigenvalue of the Laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner .