GLEE: Geometric Laplacian Eigenmap Embedding

Leo Torres; Kevin S Chan; Tina Eliassi-Rad

arXiv:1905.09763·cs.LG·March 10, 2020

GLEE: Geometric Laplacian Eigenmap Embedding

Leo Torres, Kevin S Chan, Tina Eliassi-Rad

PDF

3 Repos

TL;DR

GLEE introduces a novel geometric approach to graph embedding that leverages simplex geometry, outperforming spectral methods like Laplacian Eigenmaps in reconstruction and link prediction tasks.

Contribution

The paper proposes GLEE, a new graph embedding method based on geometric properties rather than spectral assumptions, improving performance in key graph tasks.

Findings

01

GLEE outperforms Laplacian Eigenmaps in graph reconstruction.

02

GLEE achieves better link prediction accuracy.

03

The geometric approach provides more meaningful embeddings.

Abstract

Graph embedding seeks to build a low-dimensional representation of a graph G. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps, which constructs a graph embedding based on the spectral properties of the Laplacian matrix of G. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of G. We introduce a new approach, Geometric Laplacian Eigenmap Embedding (or GLEE for short), and demonstrate that it outperforms various other techniques (including Laplacian Eigenmaps) in the…

Tables1

Table 1. Table 1. Data sets used in this work (all undirected, unweighted): number of nodes n 𝑛 n , number of edges m 𝑚 m , and average clustering coefficient c ¯ ¯ 𝑐 \bar{c} of the largest connected component of each network. AS stands for autonomous systems of the Internet.

Name	$n$	$m$	$\bar{c}$	Type
PPI (Rolland et al., 2014)	$4, 182$	$13, 343$	$0.04$	protein interaction
wiki-vote (Leskovec et al., 2010)	$7, 066$	$100, 736$	$0.14$	endorsement
caida (Leskovec et al., 2010)	$26, 475$	$53, 381$	$0.21$	AS Internet
CA-HepTh (Leskovec et al., 2007)	$8, 638$	$24, 806$	$0.48$	co-authorship
CA-GrQc (Leskovec et al., 2007)	$4, 158$	$13, 422$	$0.56$	co-authorship

Equations32

y^{T} L y = \frac{1}{2} i, j \sum a_{ij} (y_{i} - y_{j})^{2} .

y^{T} L y = \frac{1}{2} i, j \sum a_{ij} (y_{i} - y_{j})^{2} .

Y^{*} = Y \in R^{n \times d} arg min

Y^{*} = Y \in R^{n \times d} arg min

Y^{T} DY = I

L y_{i}^{*} = λ_{i} D y_{i}^{*} .

L y_{i}^{*} = λ_{i} D y_{i}^{*} .

Y \in R^{n \times d} arg max

Y \in R^{n \times d} arg max

Y^{T} Y = Λ .

\mathbf{\hat{L}_{ij}(\theta)}=\left\{\begin{array}[]{ll}-1&s_{i}\cdot s_{j}^{T}<\theta\\ 0&otherwise.\\ \end{array}\right.

\mathbf{\hat{L}_{ij}(\theta)}=\left\{\begin{array}[]{ll}-1&s_{i}\cdot s_{j}^{T}<\theta\\ 0&otherwise.\\ \end{array}\right.

θ_{opt} = θ \in [- 1, 0] arg min ∥ L - \hat{L} (θ) ∥_{F}^{2} .

θ_{opt} = θ \in [- 1, 0] arg min ∥ L - \hat{L} (θ) ∥_{F}^{2} .

- ∣ V_{1} ∣∣ V_{2} ∣ C_{V_{1}}^{T} \cdot C_{V_{2}}

- ∣ V_{1} ∣∣ V_{2} ∣ C_{V_{1}}^{T} \cdot C_{V_{2}}

∣ V_{1} ∣∣ V_{2} ∣ C_{V_{1}}^{T} \cdot C_{V_{2}}

∣ V_{1} ∣∣ V_{2} ∣ C_{V_{1}}^{T} \cdot C_{V_{2}}

C N (i, j) = - de g (i) C_{N (i)} \cdot s_{j}^{T} = - de g (j) C_{N (j)} \cdot s_{i}^{T}

C N (i, j) = - de g (i) C_{N (i)} \cdot s_{j}^{T} = - de g (j) C_{N (j)} \cdot s_{i}^{T}

C N (i, j) \approx - ∥ s_{i}^{d} ∥^{2} C_{\hat{N} (i)} \cdot (s_{j}^{d})^{T}

C N (i, j) \approx - ∥ s_{i}^{d} ∥^{2} C_{\hat{N} (i)} \cdot (s_{j}^{d})^{T}

L 3 (i, j) = - de g (i) de g (j) C_{N (i)} \cdot C_{N (j)}^{T} + k \in N (i) \cap N (j) \sum ∥ s_{k} ∥^{2}

L 3 (i, j) = - de g (i) de g (j) C_{N (i)} \cdot C_{N (j)}^{T} + k \in N (i) \cap N (j) \sum ∥ s_{k} ∥^{2}

(A^{3})_{ij}

(A^{3})_{ij}

= - k \in N (i) \sum l \in N (j) \sum s_{k} \cdot s_{l}^{T} + k \in N (i) \cap N (j) \sum s_{k} \cdot s_{k}^{T}

= - ∣ N (i) ∣∣ N (j) ∣ C_{N (i)} \cdot C_{N (j)}^{T} + k \in N (i) \cap N (j) \sum ∥ s_{k} ∥^{2},

f_{k} (x) \propto i < j \sum n 1 {x - s_{i} \cdot s_{j}^{T} < h},

f_{k} (x) \propto i < j \sum n 1 {x - s_{i} \cdot s_{j}^{T} < h},

\overset{w_{1}}{^} f_{1} (\hat{θ}_{g}) = \overset{w_{2}}{^} f_{2} (\hat{θ}_{g}),

\overset{w_{1}}{^} f_{1} (\hat{θ}_{g}) = \overset{w_{2}}{^} f_{2} (\hat{θ}_{g}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

GLEE: Geometric Laplacian Eigenmap Embedding

Leo Torres

[email protected]

Network Science Institute

Northeastern UniversityBostonMA02115

,

Kevin S. Chan

[email protected]

U.S. Army Research LabAdelphiMD20783

and

Tina Eliassi-Rad

[email protected]

Network Science Institute

Khoury College of Computer Sciences

Northeastern UniversityBostonMA02115

Abstract.

Graph embedding seeks to build a low-dimensional representation of a graph $G$ . This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps, which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$ . The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$ . We introduce a new approach, Geometric Laplacian Eigenmap Embedding (or GLEE for short), and demonstrate that it outperforms various other techniques (including Laplacian Eigenmaps) in the tasks of graph reconstruction and link prediction. Graph embedding, graph Laplacian, simplex geometry.

Graph embedding, graph Laplacian, simplex geometry.

1. Introduction

Graphs are ubiquitous in real-world systems from the internet to the world wide web to social media to the human brain. The application of machine learning to graphs is a popular and active research area. One way to apply known machine learning methods to graphs is by transforming the graph into a representation that can be directly fed to a general machine learning pipeline. For this purpose, the task of graph representation learning, or graph embedding, seeks to build a vector representation of a graph by assigning to each node a feature vector that can then be fed into any machine learning algorithm.

Popular graph embedding techniques seek an embedding where the distance between the latent representations of two nodes represents their similarity. For example, Chen et al. (2018) calls this the “community aware” property (nodes in a community are considered similar, and thus their representations must be close to one another), while Chen et al. (2017) calls it a “symmetry” between the node domain and the embedding domain. Others call methods based on this property with various names such as “positional” embeddings (Srinivasan and Ribeiro, 2019) or “proximity-based” embeddings (Jin et al., 2019). Consequently, many of these approaches are formulated in such a way that the distance (in the embedding space) between nodes that are similar (in the original data domain) is small. Here, we present a different approach. Instead of focusing on minimizing the distance between similar nodes, we seek an embedding that preserves the most basic structural property of the graph, namely adjacency; the works (Srinivasan and Ribeiro, 2019; Jin et al., 2019) call this approach “structural” node embeddings. Concretely, if the nodes $i$ and $j$ are neighbors in the graph $G$ with $n$ nodes, we seek $d$ -dimensional vectors $s_{i}$ and $s_{j}$ such that the adjacency between $i$ and $j$ is encoded in the geometric properties of $s_{i}$ and $s_{j}$ , for some $d\ll n$ . Examples of geometric properties are the dot product of two vectors (which is a measure of the angle between them), the length (or area or volume) of a line segment (or polygon or polyhedron), the center of mass or the convex hull of a set of vectors, among others. In Section 3 we propose one such geometric embedding technique, called Geometric Laplacian Eigenmap Embedding (GLEE), that is based on the properties of the Laplacian matrix of $G$ , and we then proceed to compare it to the original formulation of Laplacian Eigenmaps as well as other popular embedding techniques.

GLEE has deep connections with the so-called simplex geometry of the Laplacian (Devriendt and Van Mieghem, 2019; Fiedler, 2011). Fiedler (2011) first made this observation, which highlights the bijective correspondence between the Laplacian matrix of an undirected, weighted graph and a geometric object known as a simplex. Using this relationship, we find a graph embedding such that the representations $s_{i},s_{j}$ of two non-adjacent nodes $i$ and $j$ are always orthogonal, $s_{i}\cdot s_{j}=0$ , thus achieving a geometric encoding of adjacency. Note that this does not satisfy the “community aware” property of (Chen et al., 2018). For example, the geometric embedding $s_{i}$ of node $i$ will be orthogonal to each non-neighboring node, including those in its community. Thus, $s_{i}$ is not close to other nodes in its community, whether we define closeness in terms of Euclidean distance or cosine similarity. However, we show that this embedding – based on the simplex geometry – contains desirable information, and that it outperforms the original, distance-minimizing, formulation of Laplacian Eigenmaps (LE) on the tasks of graph reconstruction and link prediction in certain cases.

The contributions of this work are as follows.

(1)

We present a geometric framework for graph embedding that departs from the tradition of looking for representations that minimize the distance between similar nodes by highlighting the intrinsic geometric properties of the Laplacian matrix. 2. (2)

The proposed method, Geometric Laplacian Eigenmap Embedding (GLEE), while closely related to the Laplacian Eigenmaps (LE) method, outperforms LE in the tasks of link prediction and graph reconstruction. Moreover, a common critique of LE is that it only considers first-order adjacency in the graph. We show that GLEE takes into account higher order connections (see Section 3.2). 3. (3)

The performance of existing graph embedding methods (which minimize distance between similar nodes) suffers when the graph’s average clustering coefficient is low. This is not the case for GLEE.

In Section 2 we recall the original formulation of LE, in order to define the Geometric Laplacian Eigenmap Embedding (GLEE) in Section 3 and discuss its geometric properties. We mention related work in Section 4 and present experimental studies of GLEE in Section 5. We finish with concluding remarks in Section 6.

2. Background on Laplacian Eigenmaps

Belkin and Niyogi (Belkin and Niyogi, 2002, 2003) introduced Laplacian Eigenmaps as a general-purpose method for embedding and clustering an arbitrary data set. Given a data set $\{x_{i}\}_{i=1}^{n}$ , a proximity graph $G=(V,A)$ is constructed with node set $V=\{x_{i}\}$ and edge weights $\mathbf{A}=(a_{ij})$ . The edge weights are built using one of many heuristics that determine which nodes are close to each other and can be binary or real-valued. Some examples are $k$ nearest neighbors, $\epsilon$ -neighborhoods, heat kernels, etc. To perform the embedding, one considers the Laplacian matrix of $G$ , defined as $\mathbf{L}=\mathbf{D}-\mathbf{A}$ , where $\mathbf{D}$ is the diagonal matrix whose entries are the degrees of each node. One of the defining properties of $\mathbf{L}$ is the value of the quadratic form:

[TABLE]

The vector $y^{*}$ that minimizes the value of (1) will be such that the total weighted distance between all pairs of nodes is minimized. Here, $y_{i}$ can be thought of as the one-dimensional embedding of node $i$ . One can then extend this procedure to arbitrary $d$ -dimensional node embeddings by noting that $tr(\mathbf{Y^{T}}\mathbf{L}\mathbf{Y})=\sum_{i,j}a_{ij}\|y_{i}-y_{j}\|^{2}$ , where $\mathbf{Y}\in\mathbb{R}^{n\times d}$ and $y_{i}$ is the $i$ th row of $\mathbf{Y}$ . The objective function in this case is

[TABLE]

Importantly, the quantity $tr(\mathbf{Y^{T}}\mathbf{L}\mathbf{Y})$ has a global minimum at $\mathbf{Y}=0$ . Therefore, a restriction is necessary to guarantee a non-trivial solution. Belkin and Niyogi (Belkin and Niyogi, 2002, 2003) choose $\mathbf{Y^{T}}\mathbf{D}\mathbf{Y}=\mathbf{I}$ , though others are possible. Applying the method of Lagrange multipliers, one can see that the solution of (2) is achieved at the matrix $\mathbf{Y^{*}}$ whose rows $y_{i}^{*}$ are the solutions to the eigenvalue problem

[TABLE]

When the graph contains no isolated nodes, $y_{i}^{*}$ is then an eigenvector of the matrix $\mathbf{D^{-1}}\mathbf{L}$ , also known as the normalized Laplacian matrix. The embedding of a node $j$ is then the vector whose entries are the $j$ th elements of the eigenvectors $y_{1}^{*},y_{2}^{*},...,y_{d}^{*}$ .

3. Proposed Approach: Geometric Laplacian Eigenmaps

We first give our definition and then proceed to discuss both the algebraic and geometric motivations behind it.

Definition 3.1 (GLEE).

Given a graph $G$ , consider its Laplacian matrix $\mathbf{L}$ . Using singular value decomposition we may write $\mathbf{L}=\mathbf{S}\mathbf{S^{T}}$ for a unique matrix $\mathbf{S}$ . Define $\mathbf{S^{d}}$ as the matrix of the first $d$ columns of $\mathbf{S}$ . If $i$ is a node of $G$ , define its $d$ -dimensional Geometric Laplacian Eigenmap Embedding (GLEE) as the $i$ th row of $\mathbf{S^{d}}$ , denoted by $s^{d}_{i}$ . If the dimension $d$ is unambiguous, we will just write $s_{i}$ .

Algebraic motivation

In the case of positive semidefinite matrices, such as the Laplacian, the singular values coincide with the eigenvalues. Moreover, it is well known that $\mathbf{S^{d}}$ is the matrix of rank $d$ that is closest to $\mathbf{L}$ in Frobenius norm, i.e., $\|\mathbf{L}-\mathbf{S^{d}}\mathbf{(S^{d})^{T}}\|_{F}\leq\|\mathbf{L}-\mathbf{M}\|_{F}$ for all matrices $\mathbf{M}$ of rank $d$ . Because of this, we expect $\mathbf{S^{d}}$ to achieve better performance in the graph reconstruction task than any other $d$ -dimensional embedding (see Section 5.1).

As can be seen from Equation (1), the original formulation of Laplacian Eigenmaps is due to the fact that the distance between the embeddings of neighboring nodes is minimized, under the restriction $Y^{T}DY=I$ . We can also formulate GLEE in terms of the distance between neighboring nodes. Perhaps counterintuitively, GLEE solves a distance maximization problem, as follows. The proof follows from a routinary application of Lagrange multipliers and is omitted.

Theorem 3.2.

Let $\mathbf{\Lambda}$ be the diagonal matrix whose entries are the eigenvalues of $\mathbf{L}$ . Consider the optimization problem

[TABLE]

Its solution is the matrix $\mathbf{S^{d}}$ whose columns are the eigenvectors corresponding to the largest eigenvalues of $\mathbf{L}$ . If $d=n$ then ${\mathbf{L}=\mathbf{S^{d}}\mathbf{\left(S^{d}\right)^{T}}}$ . ∎

The importance of Theorem 3.2 is to highlight the fact that distance-minimization may be misleading when it comes to exploiting the properties of the embedding space. Indeed, the original formulation of Laplacian Eigenmaps, while well established in Equation 2, yields as result the eigenvectors corresponding to the lowest eigenvalues of $\mathbf{L}$ . However, standard results in linear algebra tell us that the best low rank approximation of $\mathbf{L}$ is given by the eigenvectors corresponding to the largest eigenvalues. Therefore, these are the ones used in the definition of GLEE.

Geometric motivation

The geometric reasons underlying Definition 3.1 are perhaps more interesting than the algebraic ones. A recent review paper (Devriendt and Van Mieghem, 2019) highlights the work of Fiedler (2011), who discovered a bijective correspondence between the Laplacian matrix of a graph and a higher-dimensional geometric object called a simplex.

Definition 3.3.

Given a set of $k+1$ $k$ -dimensional points $\{p_{i}\}_{i=0}^{k}$ , if they are affinely independent (i.e., if the set of $k$ points $\{p_{0}-p_{i}\}_{i=1}^{k}$ is linearly independent), then their convex hull is called a simplex.

A simplex is a high-dimensional polyhedron that is the generalization of a 2-dimensional triangle or a 3-dimensional tetrahedron. To see the connection between the Laplacian matrix of a graph and simplex geometry we invoke the following result. The interested reader will find the proof in (Devriendt and Van Mieghem, 2019; Fiedler, 2011).

Theorem 3.4.

Let $\mathbf{Q}$ be a positive semidefinite $k\times k$ matrix. There exists a $k\times k$ matrix $\mathbf{S}$ such that $\mathbf{Q}=\mathbf{S}\mathbf{S^{T}}$ . The rows of $\mathbf{S}$ lie at the vertices of a simplex if and only if the rank of $\mathbf{Q}$ is $k-1$ . ∎

Corollary 3.5.

Let $G$ be a connected graph with $n$ nodes. Its Laplacian matrix $\mathbf{L}$ is positive semidefinite, has rank $n-1$ , and has eigendecomposition $\mathbf{L}=\mathbf{P}\mathbf{\Lambda}\mathbf{P^{T}}$ . Write $\mathbf{S}=\mathbf{P}\sqrt{\mathbf{\Lambda}}$ . Then, $\mathbf{L}=\mathbf{S}\mathbf{S^{T}}$ and the rows of $\mathbf{S}$ are the vertices of a $(n-1)$ -dimensional simplex called the simplex of $G$ . ∎

Corollary 3.5 is central to the approach in (Devriendt and Van Mieghem, 2019), providing a correspondence between graphs and simplices. Corollary 3.5 also shines a new light on GLEE: the matrix $\mathbf{S^{d}}$ from Definition 3.1 corresponds to the first $d$ dimensions of the simplex of $G$ . In other words, computing the GLEE embeddings of a graph $G$ is equivalent to computing the simplex of $G$ and projecting it down to $d$ dimensions. We proceed to explore the geometric properties of this simplex that can aid in the interpretation of GLEE embeddings. We can find in (Devriendt and Van Mieghem, 2019) the following result.

Corollary 3.6.

Let $s_{i}$ be the $i$ th row of $\mathbf{S}$ in Corollary 3.5. $s_{i}$ is the simplex vertex corresponding to node $i$ , and satisfies $\|s_{i}\|^{2}=\deg(i)$ , and $s_{i}\cdot s_{j}^{T}=-a_{ij}$ , where $\deg(i)$ is the degree of $i$ . In particular, $s_{i}$ is orthogonal to the embedding of any non-neighboring node $j$ . ∎

Corollary 3.6 highlights some of the basic geometric properties of the simplex (such as lengths and dot products) that can be interpreted in graph theoretical terms (resp., degrees and adjacency). In Figure 1 we show examples of these properties. It is worth noting that other common matrix representations of graphs do not present a spectral decomposition that yields a simplex. For example, the adjacency matrix $\mathbf{A}$ is not in general positive semidefinite, and the normalized Laplacian $\mathbf{D^{-1}}\mathbf{L}$ (used by LE) is not symmetric. Therefore, Theorem 3.4 does not apply to them. We now proceed to show how to take advantage of the geometry of GLEE embeddings, which can all be thought of as coming from the simplex, in order to perform common graph mining tasks. In the following we focus on unweighted, undirected graphs.

3.1. Graph Reconstruction

For a graph $G$ with $n$ nodes, consider its $d$ -dimensional GLEE embedding $\mathbf{S^{d}}$ . When $d=n$ , in light of Corollary 3.6, the dot product between any two embeddings $s_{i},s_{j}$ can only take the values $-1$ or [math] and one can reconstruct the graph perfectly from its simplex. However, if $d<n$ , the distribution of dot products will take on real values around $-1$ and [math] with varying amounts of noise; the larger the dimension $d$ , the less noise we find around the two modes. It is important to distinguish which nodes $i,j$ have embeddings $s_{i},s_{j}$ whose dot product belongs to the mode at [math] or to the mode at $-1$ , for this determines whether or not the nodes are neighbors in the graph. One possibility is to simply “split the difference” and consider $i$ and $j$ as neighbors whenever $s_{i}\cdot s_{j}<-0.5$ . More generally, given a graph $G$ and its embedding $\mathbf{S^{d}}$ , define $\mathbf{\hat{L}(\theta)}$ to be the estimated Laplacian matrix using the above heuristic with threshold $\theta$ , that is

[TABLE]

Then, we seek the value of $\theta$ , call it $\theta_{\text{opt}}$ , that minimizes the loss

[TABLE]

If all we have access to is the embedding, but not the original graph, we cannot optimize Equation (6) directly. Thus, we have to estimate $\theta_{\text{opt}}$ heuristically. As explained above, one simple estimator is the constant $\hat{\theta}_{c}=-0.5$ . We develop two other estimators: $\hat{\theta}_{k},\hat{\theta}_{g}$ , obtained by applying Kernel Density Estimation and Gaussian Mixture Models, respectively. We do so in Appendix A as their development has little to do with the geometry of GLEE embeddings. Our experiments show that different thresholds $\theta_{c}$ , $\theta_{k}$ , and $\theta_{g}$ produce excellent results on different data sets; see Appendix A for discussion.

3.2. Link Prediction

Since the objective of GLEE is to directly encode graph structure in a geometric way, rather than solve any one particular task, we are able to use it in two different ways to perform link prediction. These are useful in different kinds of networks.

3.2.1. Number of Common Neighbors

It is well known that heuristics such as number of common neighbors (CN) or Jacard similarity (JS) between neighborhoods are highly effective for the task of link prediction in networks with a strong tendency for triadic closure (Sarkar et al., 2011). Here, we show that we can use the geometric properties of GLEE in order to approximately compute CN. For the purpose of exposition, we assume $d=n$ unless stated otherwise in this section.

Given an arbitrary subset of nodes $V$ in the graph $G$ , we denote by $|V|$ its number of elements. We further define the centroid of $V$ , denoted by $C_{V}$ , as the centroid of the simplex vertices that correspond to its nodes, i.e., $C_{V}=\frac{1}{|V|}\sum_{i\in V}s_{i}$ . The following lemma, which can be found in (Devriendt and Van Mieghem, 2019), highlights the graph-theoretical interpretation of the geometric object $C_{V}$ .

Lemma 3.7 (From (Devriendt and

Van Mieghem, 2019)).

Given a graph $G$ and its GLEE embedding $S$ , consider two disjoint node sets $V_{1}$ and $V_{2}$ . Then, the number of edges with one endpoint in $V_{1}$ and one endpoint in $V_{2}$ , is given by

[TABLE]

Proof.

By linearity of the dot product, we have

[TABLE]

The expression on the right is precisely the required quantity. ∎

Lemma 3.7 says that we can use the dot product between the centroids of two node sets to count the number of edges that are shared by them. Thus, we now reformulate the problem of finding the number of common neighbors between two nodes in terms of centroids of node sets. In the following, we use $N(i)$ to denote the neighborhood of node $i$ , that is, the set of nodes connected to it.

Lemma 3.8.

Let $i,j\in V$ be non-neighbors. Then, the number of common neighbors of $i$ and $j$ , denoted by $CN(i,j)$ , is given by

[TABLE]

Proof.

Apply Lemma 3.7 to the node sets $V_{1}=N(i)$ and $V_{2}=\{j\}$ , or, equivalently, to $V_{1}=N(j)$ and $V_{2}=\{i\}$ . ∎

Now assume we have the $d$ -dimensional GLEE of $G$ . We approximate $CN(i,j)$ by estimating both $\deg(i)$ and $C_{N(j)}$ . First, we know from Corollary 3.6 that $\deg(i)\approx\|s_{i}^{d}\|^{2}$ . Second, we define the approximate neighbor set of $i$ as $\hat{N}(i)=\{k:s_{k}^{d}\cdot(s_{i}^{d})^{T}<\hat{\theta}\}$ , where $\hat{\theta}$ is any of the estimators from Section 3.1. We can now write

[TABLE]

The higher the value of this expression, the more confident is our prediction that the link $(i,j)$ exists.

3.2.2. Number of Paths of Length 3

A common critique of the original Laplacian Eigenmaps algorithm is that it only takes into account first order connections, which were considered in Section 3.2.1. Furthermore, Kovács et al. (2018) point out that the application of link prediction heuristics CN and JS does not have a solid theoretical grounding for certain types of biological networks such as protein-protein interaction networks. They further propose to use the (normalized) number of paths of length three (L3) between two nodes to perform link prediction. We next present a way to approximate L3 using GLEE. This achieves good performance in those networks where CN and JS are invalid, and show that GLEE can take into account higher-order connectivity of the graph.

Lemma 3.9.

Assume $S$ is the GLEE of a graph $G$ of dimension $d=n$ . Then, the number of paths of length three between two distinct nodes $i$ and $j$ is

[TABLE]

Proof.

The number of paths of length three between $i$ and $j$ is $(\mathbf{A^{3}})_{ij}$ , where $\mathbf{A}$ is the adjacency matrix of $G$ . We have

[TABLE]

where the last expression follows by the linearity of the dot product, and is equivalent to (11). ∎

When $d<n$ , we can estimate $\deg(i)$ by $\|s_{i}^{d}\|^{2}$ and $N(i)$ by $\hat{N}(i)$ as before, with the help of an estimator $\hat{\theta}$ from Section 3.1.

3.3. Runtime analysis

On a graph $G$ with $n$ nodes, finding the $k$ largest eigenvalues and eigenvectors of the Laplacian takes $O(kn^{2})$ time, if one uses algorithms for fast approximate singular value decomposition (Trefethen and Bau III, 1997; Halko et al., 2011). Given a $k$ -dimensional embedding matrix $S$ , reconstructing the graph is as fast as computing the product $S\cdot S^{T}$ and applying the threshold $\theta$ to each entry, thus it takes $O(n^{\omega}+n^{2})$ , where $\omega$ is the exponent of matrix multiplication. Approximating the number of common neighbors between nodes $i$ and $j$ depends only on the dot products between embeddings corresponding to their neighbors, thus it takes $O(k\times\min(\deg(i),\deg(j)))$ , while approximating the number of paths of length 3 takes $O(k\times\deg(i)\times\deg(j))$ .

4. Related Work

Spectral analyses of the Laplacian matrix have multiple applications in graph theory, network science, and graph mining (Newman, 2018; Spielman, 2017; Van Mieghem, 2010). Indeed, the eigendecomposition of the Laplacian has been used for sparsification (Spielman and Srivastava, 2011), clustering (von Luxburg, 2007), dynamics (Van Mieghemy et al., 2014; Prakash et al., 2014), robustness (Jamakovic and Van Mieghem, 2008; Shahrivar et al., 2015), etc. We here discuss those applications that are related to the general topic of this work, namely, dimensionality reduction of graphs.

One popular application is the use of Laplacian eigenvectors for graph drawing (Pisanski and Shawe-Taylor, 2000; Koren, 2005), which can be thought of as graph embedding for the specific objective of visualization. In (Pisanski and Shawe-Taylor, 2000) one such method is outlined, which, similarly to GLEE, assigns a vector, or higher-dimensional position, to each node in a graph using the eigenvectors of its Laplacian matrix, in such a way that the resulting vectors have certain desirable geometric properties. However, in the case of (Pisanski and Shawe-Taylor, 2000), those geometric properties are externally enforced as constraints in an optimization problem, whereas GLEE uses the intrinsic geometry already present in a particular decomposition of the Laplacian. Furthermore, their method focuses on the eigenvectors corresponding to the smallest eigenvalues of the Laplacian, while GLEE uses those corresponding to the largest eigenvalues, i.e. to the best approximation to the Laplacian through singular value decomposition.

On another front, many graph embedding algorithms have been proposed, see for example (Goyal and Ferrara, 2018; Hamilton et al., 2017) for extensive reviews. Most of these methods fall in one of the following categories: matrix factorization, random walks, or deep architectures. Of special importance to us are methods that rely on matrix factorization. Among many advantages, we have at our disposal the full toolbox of spectral linear algebra to study them (Levin et al., 2018; Charisopoulos et al., 2019; Chen and Tong, 2015, 2017). Examples in this category are the aforementioned Laplacian Eigenmaps (LE) (Belkin and Niyogi, 2002, 2003) and Graph Factorization (GF) (Ahmed et al., 2013). One important difference between GLEE and LE is that LE uses the small eigenvalues of the normalized Laplacian $\mathbf{D^{-1}}\mathbf{L}$ , while GLEE uses the large eigenvalues of $\mathbf{L}$ . Furthermore, LE does not present the rich geometry of the simplex. Graph Factorization (GF) finds a decomposition of the weighted adjacency matrix $\mathbf{W}$ with a regularization term. Their objective is to find embeddings $\{s_{i}\}$ such that $s_{i}\cdot s_{j}=a_{ij}$ , whereas in our case we try to reconstruct $s_{i}\cdot s_{j}=\mathbf{L_{ij}}$ . This means that the embeddings found by Graph Factorization will present different geometric properties. There are many other methods of dimensionality reduction on graphs that depend on matrix factorization (Kuang et al., 2012; Cai et al., 2011; Wang et al., 2017). However, even if some parameterization, or special case, of any of these methods results in a method resembling the singular value decomposition of the Laplacian (thus imitating GLEE), to the authors’ knowledge none of these methods make direct use of its intrinsic geometry.

Among the methods based on random walks we find DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016), both of which adapt the framework of word embeddings (Mikolov et al., 2013) to graphs by using random walks and optimize a shallow architecture. It is also worth mentioning NetMF (Qiu et al., 2018) which unifies several methods in a single algorithm that depends on matrix factorization and thus unifies the two previous categories.

Among the methods using deep architectures, we have the deep autoencoder Structural Deep Network Embedding (SDNE) (Wang et al., 2016). It penalizes representations of similar nodes that are far from each other using the same objective as LE. Thus, SDNE is also based on the distance-minimization approach. There is also (Cao et al., 2016) which obtains a non-linear mapping between the probabilistic mutual information matrix (PMI) of a sampled network and the embedding space. This is akin to applying the distance-minimization assumption not to the graph directly but to the PMI matrix.

Others have used geometric approaches to embedding. For example, (Estrada et al., 2014) and (Pereda and Estrada, 2019) find embeddings on the surface of a sphere, while (Papadopoulos et al., 2012) and (Nickel and Kiela, 2018) use the hyperbolic plane. These methods are generally developed under the assumption that the embedding space is used to generate the network itself. They are therefore aimed at recovering the generating coordinates, and not, as in GLEE’s case, at finding a general representation suitable for downstream tasks.

5. Experiments

We put into practice the procedures detailed in Sections 3.1 and 3.2 to showcase GLEE’s performance in the tasks of link prediction and graph reconstruction. Code to compute the GLEE embeddings of networks and related computations is publicly available at (Torres, 2019). For our experiments, we use the following baselines: GF because it is a direct factorization of the adjacency matrix, node2vec because it is regarded as a reference point among those methods based on random walks, SDNE because it aims to recover the adjacency matrix of a graph (a task GLEE excels at), NetMF because it generalizes several other well-known techniques, and LE because it is the method that most directly resembles our own. In this way we cover all of the categories explained in Section 4 and use either methods that resemble GLEE closely or methods that have been found to generalize other techniques. For node2vec and SDNE we use default parameters. For NetMF we use the spectral approximation with rank $256$ . The data sets we use are outlined in Table 1. Beside comparing GLEE to the other algorithms, we are interested in how the graph’s structure affects performance of each method. This is why we have chosen data sets have similar number of nodes and edges, but different values of average clustering coefficient. Accordingly, we report our results with respect to the average clustering coefficient of each data set and the number of dimensions of the embedding (the only parameter of GLEE). In Appendix B we compare the performance of each estimator explained in Section 3.1. In the following experiments we use $\hat{\theta}_{k}$ as our estimator for $\theta_{opt}$ .

5.1. Graph Reconstruction

Given a GLEE matrix $S^{d}$ , how well can we reconstruct the original graph? This is the task of graph reconstruction. We use as performance metric the precision at $k$ measure, defined as the precision of the first $k$ reconstructed edges. Note that precision at k must always decrease when $k$ grows large, as there will be few correct edges left to reconstruct.

Following Section 3.1, we reconstruct the edge $(i,j)$ if $s_{i}^{d}\cdot s_{j}^{d}<\hat{\theta}$ . The further the dot product is from [math] (the ideal value for non-edges), the more confident we are in the existence of this edge. For LE, we reconstruct the edge $(i,j)$ according to how small the distance between their embeddings is. For both GF, node2vec and NetMF, we reconstruct edges based on how high their dot product is. SDNE is a deep autoencoder and thus its very architecture involves a mechanism to reconstruct the adjacency matrix of the input graph.

We show results in Figure 2, where we have ordered data sets from left to right in ascending order of clustering coefficient, and from bottom up in ascending order of embedding dimension. GF results omitted from this Figure as it scored close to [math] for all values of $k$ and $d$ . On CA-GrQc, for low embedding dimension $d=32$ , SDNE performs best among all methods, followed by node2vec and LE. However, as $d$ increases, GLEE substantially outperforms all others, reaching an almost perfect precision score at the first 10,000 reconstructed edges. Interestingly, other methods do not substantially improve performance as $d$ increases. This analysis is also valid for CA-HepTh, another data set with high clustering coefficient. However, on PPI, our data set with lowest clustering coefficient, GLEE drastically outperforms all other methods for all values of $d$ . Interestingly, LE and node2vec perform well compared to other methods in data sets with high clustering, but their performance drops to near zero on PPI. We hypothesize that this is due to the fact that LE and node2vec depend on the “community-aware” assumption, thereby assuming that two proteins in the same cluster would interact with each other. This is the exact point that (Kovács et al., 2018) refutes. On the other hand, GLEE directly encodes graph structure, making no assumptions about the original graph, and its performance depends more directly on the embedding dimension than on the clustering coefficient, or on any other assumption about graph structure. GLEE’s performance on data sets PPI, Wiki-Vote, and caida point to the excellent potential of our method in the case of low clustering coefficient.

5.2. Link Prediction

Given the embedding of a large subgraph of some graph $G$ , can we identify which edges are missing? The experimental setup is as follows. Given a graph $G$ with $n$ nodes, node set $V$ and edge set $E_{obs}$ , we randomly split its edges into train and test sets $E_{train}$ and $E_{test}$ . We use $|E_{train}|=0.75n$ , and we make sure that the subgraph induced by $E_{train}$ , denoted by $G_{train}$ , is connected and contains every node of $V$ . We then proceed to compute the GLEE of $G_{train}$ and test on $E_{test}$ . We report AUC metric for this task. We use both techniques described in Sections 3.2.1 and 3.2.2, which we label GLEE and GLEE-L3 respectively

Figure 3 shows that node2vec repeats the behavior seen in graph reconstruction of increased performance as clustering coefficient increases, though again it is fairly constant with respect to embedding dimension. This observation is also true for NetMF. On the high clustering data sets, LE and GLEE have comparable performance to each other. However, either GLEE or GLEE-L3 perform better than all others on the low clustering data sets PPI, Wiki-Vote, as expected. Also as expected, the performance of GLEE-L3 decreases as average clustering increases. Note that GLEE and LE generally improve performance when $d$ increases, whereas node2vec and SDNE do not improve. (GF and SDNE not shown in Figure 3 for clarity. They scored close to $0.5$ and $0.6$ in all data sets independently of $d$ .) The reason why none of the methods studied here perform better than $0.6$ AUC in the caida data set is an open question left for future research. We conclude that the hybrid approach of NetMF is ideal for high clustering coefficient, whereas GLEE is a viable option in the case of low clustering coefficient as evidenced by the results on PPI, Wiki-Vote, and caida.

6. Conclusions

In this work we have presented the Geometric Laplacian Eigenmap Embedding (GLEE), a geometric approach to graph embedding that exploits the intrinsic geometry of the Laplacian. When compared to other methods, we find that GLEE performs the best when the underlying graph has low clustering coefficient, while still performing comparably to other state-of-the-art methods when the clustering coefficient is high. We hypothesize that this is due to the fact that the large eigenvalues of the Laplacian correspond to the small eigenvalues of the adjacency matrix and thus represent the structure of the graph at a micro level. Furthermore, we find that GLEE’s performance increases as the embedding dimension increases, something we do not see in other methods. In contrast to techniques based on neural networks, which have many hyperparameters and costly training phases, GLEE has only one parameter other than the embedding dimension, the threshold $\theta$ , and we have provided three different ways of optimizing for it. Indeed, GLEE only depends on the SVD of the Laplacian matrix.

We attribute these desirable properties of GLEE to the fact that it departs from the traditional literature of graph embedding by replacing the “community aware” notion (similar nodes’ embeddings must be similar) with the notion of directly encoding graph structure using the geometry of the embedding space. In all, we find that GLEE is a promising alternative for graph embedding due to its simplicity in both theoretical background and computational implementation, especially in the case of low clustering coefficient. By taking a direct geometric encoding of graph structure using the simplex geometry, GLEE covers the gap left open by the “community aware” assumption of other embedding techniques, which requires high clustering. Future lines of work will explore what other geometric properties of the embedding space can yield interesting insight, as well as what are the important structural properties of graphs, such as clustering coefficient, that affect the performance of these methods.

Funding

This work was supported by the National Science Foundation [IIS-1741197]; and by the Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (U.S. Army Research Lab Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation here on.

Appendix A Threshold Estimators

We present two other estimators of $\theta_{opt}$ to accompany the heuristic $\hat{\theta}_{c}=-0.5$ mentioned in Section 3.1.

A.1. Kernel Density Estimation

As can be seen in Figure 4, the problem of finding a value of $\theta$ that sufficiently separates the peaks corresponding to edges (around the peak centered at $-1$ ) and non-edges (around the peak centered at [math]) can be stated in terms of density estimation. That is, given the histogram of values of $s_{i}\cdot s_{j}^{T}$ for all $i,j$ , we can approximate the density of this empirical distribution by some density function $f_{k}$ . A good heuristic estimator of $\theta_{\text{opt}}$ is the value that minimizes $f_{k}$ between the peaks near $-1$ and [math]. For this purpose, we use Kernel Density Estimation over the distribution of $s_{i}\cdot s_{j}^{T}$ and a box kernel (a.k.a. ”top hat” kernel) function to define

[TABLE]

We then use gradient descent to find the minimal value of $f_{k}$ between the values of $-1$ and [math]. We call this value $\hat{\theta}_{k}$ . We have found experimentally that a value of $h=0.3$ gives excellent results, achieving near zero error in the reconstruction task (Figure 4, middle row).

A.2. Gaussian Mixture Models

Here we use a Gaussian Mixture Model (GMM) over the distribution of $s_{i}\cdot s_{j}$ . The model will find the two peaks near $-1$ and [math] and fit each to a Gaussian distribution. Once the densities of said Gaussians have been found, say $f_{1}$ and $f_{2}$ , we define the estimator $\hat{\theta}_{g}$ as that point at which the densities are equal (see Figure 4, bottom row).

However, we found that a direct application of this method yields poor results due to the sparsity of network data sets. High sparsity implies that the peak at [math] is orders of magnitude higher than the one at $-1$ . Thus, the left peak will usually be hidden by the tail of the right one so that the GMM cannot detect it. To solve this issue we take two steps. First, we use a Bayesian version of GMM that accepts priors for the Gaussian means and other parameters. This guides the GMM optimization algorithm to find the right peaks at the right places. Second, we sub-sample the distribution of dot products in order to minimize the difference between the peaks, and then to fix it back after the fit. Concretely, put $r=\sum_{i<j}1\{s_{i}\cdot s_{j}^{T}<\hat{\theta}_{c}\}$ . That is, $r$ is the number of dot products less than the constant $\hat{\theta}_{c}=-0.5$ . Instead of fitting the GMM to all the observed dot products, we fit it to the set of all $r$ dot products less than $\hat{\theta}_{c}$ plus a random sample of $r$ dot products larger than $\hat{\theta}_{c}$ . This temporarily fixes the class imbalance, which we recover after the model has been fit as follows. The GMM fit will yield a density for the sub-sample as $f_{g}=w_{1}f_{1}+w_{2}f_{2}$ , where $f_{i}$ is the density of the $i$ th Gaussian, and $w_{i}$ are the mixture weights, for $i=1,2$ . Since we sub-sampled the distribution, we will get $w_{1}\approx w_{2}\approx 0.5$ , but we need the weights to reflect the original class imbalance. For this purpose, we define $\hat{w}_{1}=\hat{m}/\binom{n}{2}$ and $\hat{w}_{2}=1-\hat{w}_{1}$ , where $\hat{m}$ is an estimate for the number of edges in the graph. (This can be estimated in a number of ways, for example one may put $\hat{m}=r$ , or $\hat{m}=n\log(n)$ .) Finally, we define the estimator as the value that satisfies

[TABLE]

under the constraint that $-1<\hat{\theta}_{g}<0$ . Since $f_{1}$ and $f_{2}$ are known Gaussian densities, Equation 16 can be solved analytically.

In this case, due to sparsity, the problem of optimizing the GMM is one of non-parametric density estimation with extreme class imbalance. We solve it by utilizing priors for the optimization algorithm, as well as sub-sampling the distribution of dot products, according to some of its known features (i.e., the fact that the peaks will be found near $-1$ and [math]), and we account for the class imbalance by estimating graph sparsity separately. Finally, we define the estimator $\hat{\theta}_{g}$ according to Equation 16. Algorithm 1 gives an overview of this procedure. For a comparison between the effectiveness of the three different estimators $\hat{\theta}_{c},\hat{\theta}_{k},\hat{\theta}_{g}$ , see Appendix B.

Appendix B Estimator Comparison

In Section 3.1 and Appendix A we outlined three different schemes to estimate $\theta_{\text{opt}}$ which resulted in $\hat{\theta}_{c},\hat{\theta}_{k},\hat{\theta}_{g}$ . Which one is the best? We test each each of these estimators on three random graph models: Erdös-Rényi (ER) (Erdös and Rényi, 1960), Barabási-Albert (BA) (Barabási and Albert, 1999), and Hyperbolic Graphs (HG) (Krioukov et al., 2010). For each random graph with adjacency matrix $\mathbf{A}$ , we compute the Frobenius norm of the difference between the reconstructed adjacency matrix $\mathbf{\hat{A}}$ using each of the three estimators. In Figure 5 we show our results. We see that $\hat{\theta}_{c}$ and $\hat{\theta}_{k}$ achieve similar performance across data sets, while $\hat{\theta}_{g}$ outperforms the other two for ER at $d=512$ , though it has high variability in the other models. From these results we conclude that at low dimensions $d=32$ , too much information has been lost and thus there is no hope to learn a value of $\hat{\theta}$ that outperforms the heuristic $\hat{\theta}_{c}=-0.5$ . However, at larger dimensions, the estimators $\hat{\theta}_{g}$ and $\hat{\theta}_{k}$ perform better, with different degrees of variability. We conclude also that no single heuristic for $\hat{\theta}$ is best for all types of graphs. In the rest of our experiments we use $\hat{\theta}_{k}$ as our estimator for $\theta_{opt}$ . We highlight that even though $\theta_{k}$ is better than $\theta_{c}$ in some data sets, it might be costly to compute, while $\theta_{c}$ incurs no additional costs.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Ahmed et al . (2013) Amr Ahmed, Nino Shervashidze, Shravan M. Narayanamurthy, Vanja Josifovski, and Alexander J. Smola. 2013. Distributed large-scale natural graph factorization. In WWW . 37–48.
3Barabási and Albert (1999) Albert-László Barabási and Réka Albert. 1999. Emergence of Scaling in Random Networks. Science 286, 5439 (1999), 509–512.
4Belkin and Niyogi (2002) Mikhail Belkin and Partha Niyogi. 2002. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In NIPS 14 . 585–591.
5Belkin and Niyogi (2003) Mikhail Belkin and Partha Niyogi. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation 15, 6 (2003), 1373–1396.
6Cai et al . (2011) Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 33, 8 (2011), 1548–1560.
7Cao et al . (2016) Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations. In AAAI . AAAI Press, 1145–1152.
8Charisopoulos et al . (2019) Vasileios Charisopoulos, Austin R. Benson, and Anil Damle. 2019. Incrementally Updated Spectral Embeddings. Co RR abs/1909.01188 (2019). ar Xiv:1909.01188 http://arxiv.org/abs/1909.01188