Sublinear Update Time Randomized Algorithms for Dynamic Graph Regression

Mostafa Haghir Chehreghani

arXiv:1905.11963·cs.LG·October 10, 2022

Sublinear Update Time Randomized Algorithms for Dynamic Graph Regression

Mostafa Haghir Chehreghani

PDF

Open Access

TL;DR

This paper introduces the first sublinear time randomized algorithms for dynamic graph regression, enabling faster updates in data science applications by leveraging advanced sketching techniques.

Contribution

It proposes novel sublinear update time algorithms for dynamic graph regression using subsampled randomized Hadamard transform and CountSketch, improving efficiency over existing methods.

Findings

01

Supports edge insertion and deletion with $O(rd)$ update time

02

Supports node operations with $O(qd)$ update time

03

Achieves $1 ext{-} ext{approx}$ solutions with sublinear complexity

Abstract

A well-known problem in data science and machine learning is {\em linear regression}, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression require at least a linear time (in terms of $n$ : the size of the graph). However, this time complexity might be intractable in practice. In the current paper, we utilize {\em subsampled randomized Hadamard transform} and \textsf{CountSketch} to propose the first sublinear update time randomized algorithms for regression of general dynamic graphs. Suppose that we are given a $n \times d$ matrix embedding $M$ of the graph, where $d ≪ n$ and $M$ has certain properties. Let $r$ be the number of samples required by subsampled randomized Hadamard transform for a $1 \pm ϵ$ approximation, which is a sublinear of $n$ . Our first algorithm supports edge insertion and…

Equations36

a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2},

a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2},

a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2}^{2} .

a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2}^{2} .

\mathbold x = \mathbold A^{†} \cdot \mathbold b

\mathbold x = \mathbold A^{†} \cdot \mathbold b

a r g mi n_{\mathbold x^{'}} ∣∣ \mathbold A \cdot \mathbold x^{'} - \mathbold b ∣ ∣_{2}^{2} = (1 \pm ϵ) a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2}^{2},

a r g mi n_{\mathbold x^{'}} ∣∣ \mathbold A \cdot \mathbold x^{'} - \mathbold b ∣ ∣_{2}^{2} = (1 \pm ϵ) a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣ ∣_{2}^{2},

a r g mi n_{\mathbold} x^{'} ∣∣ (\mathbold S \cdot \mathbold A) \cdot \mathbold x^{'} - \mathbold S \cdot \mathbold b ∣ ∣_{2}^{2} .

a r g mi n_{\mathbold} x^{'} ∣∣ (\mathbold S \cdot \mathbold A) \cdot \mathbold x^{'} - \mathbold S \cdot \mathbold b ∣ ∣_{2}^{2} .

(\mathbold S \cdot \mathbold A)^{†} \cdot \mathbold S \cdot \mathbold b,

(\mathbold S \cdot \mathbold A)^{†} \cdot \mathbold S \cdot \mathbold b,

r = max {4 8^{2} d ln (40 n d) ln (10 0^{2} d ln (40 n d)), 40 d ln (40 n d) / ϵ},

r = max {4 8^{2} d ln (40 n d) ln (10 0^{2} d ln (40 n d)), 40 d ln (40 n d) / ϵ},

a r g mi n_{\mathbold x^{'}} ∣∣ \mathbold S \cdot \mathbold A \cdot \mathbold x^{'} - \mathbold S \cdot \mathbold b ∣∣ \leq (1 + ϵ) a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣∣.

a r g mi n_{\mathbold x^{'}} ∣∣ \mathbold S \cdot \mathbold A \cdot \mathbold x^{'} - \mathbold S \cdot \mathbold b ∣∣ \leq (1 + ϵ) a r g mi n_{\mathbold x} ∣∣ \mathbold A \cdot \mathbold x - \mathbold b ∣∣.

n (d + 1) + 2 n (d + 1) lo g_{2} (r + 1) + O (r d^{2}) .

n (d + 1) + 2 n (d + 1) lo g_{2} (r + 1) + O (r d^{2}) .

r = O (d ln d ln n + \frac{d ln n}{ϵ})

r = O (d ln d ln n + \frac{d ln n}{ϵ})

O (n d ln \frac{d}{ϵ} + d^{3} ln d ln n + \frac{d ^{3} ln n}{ϵ}) .

O (n d ln \frac{d}{ϵ} + d^{3} ln d ln n + \frac{d ^{3} ln n}{ϵ}) .

q = O (\frac{d ^{2}}{ϵ ^{2}} lo g^{6} (d / ϵ)),

q = O (\frac{d ^{2}}{ϵ ^{2}} lo g^{6} (d / ϵ)),

O (nn z (\mathbold A) + d^{3} ϵ^{- 2} lo g^{7} (d / ϵ))

O (nn z (\mathbold A) + d^{3} ϵ^{- 2} lo g^{7} (d / ϵ))

\mathbold M^{'} = \mathbold M + k = 1 \sum K (\mathbold c^{k} \cdot \mathbold d^{k}^{*}) .

\mathbold M^{'} = \mathbold M + k = 1 \sum K (\mathbold c^{k} \cdot \mathbold d^{k}^{*}) .

O (n lo g_{2} (ln n ln ln n + \frac{ln n}{ϵ}) + ln n ln ln n + \frac{ln n}{ϵ})

O (n lo g_{2} (ln n ln ln n + \frac{ln n}{ϵ}) + ln n ln ln n + \frac{ln n}{ϵ})

O (ln n ln ln n + \frac{ln n}{ϵ})

O (ln n ln ln n + \frac{ln n}{ϵ})

\mathbold M^{'} = \mathbold M + k = 1 \sum K (\mathbold c^{k} \cdot \mathbold d^{k}^{*}) .

\mathbold M^{'} = \mathbold M + k = 1 \sum K (\mathbold c^{k} \cdot \mathbold d^{k}^{*}) .

O (n + ϵ^{- 2} lo g^{7} (1/ ϵ))

O (n + ϵ^{- 2} lo g^{7} (1/ ϵ))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Complex Network Analysis Techniques

Full text

Sublinear Update Time Randomized Algorithms for

Dynamic Graph Regression

Mostafa Haghir Chehreghani

Department of Computer Engineering

Amirkabir University of Technology (Tehran Polytechnic), Iran

[email protected]

Abstract

A well-known problem in data science and machine learning is linear regression, which is recently extended to dynamic graphs. Existing exact algorithms for updating the solution of dynamic graph regression require at least a linear time (in terms of $n$ : the size of the graph). However, this time complexity might be intractable in practice.

In the current paper, we utilize subsampled randomized Hadamard transform and CountSketch to propose the first sublinear update time randomized algorithms for regression of general dynamic graphs. Suppose that we are given a $n\times d$ matrix embedding $\mathbold M$ of the graph, where $d\ll n$ and $\mathbold M$ has certain properties. Let $r$ be the number of samples required by subsampled randomized Hadamard transform for a $1\pm\epsilon$ approximation, which is a sublinear of $n$ . Our first algorithm supports edge insertion and edge deletion and updates the approximate solution in $O(rd)$ time. Our second algorithm is based on CountSketch and supports edge insertion, edge deletion, node insertion and node deletion. It updates the approximate solution in $O(qd)$ time, where $q=O\left(\frac{d^{2}}{\epsilon^{2}}\log^{6}(d/\epsilon)\right)$ .

Keywords. Dynamic networks, dynamic graph regression, least squares regression, sublinear update time, subsampled randomized Hadamard transform, CountSketch

1 Introduction

One of the well-studied machine learning problems is linear regression, which is traditionally defined as follows. We receive $n$ data, where for each $i\in[1,n]$ , the data consists of a row in a matrix $\mathbold{A}$ and a single element in a vector $\mathbold{b}$ . Matrix $\mathbold{A}$ is called predictor values and $\mathbold{b}$ is called measured values. The goal is to find a vector $\mathbold{x}$ such that $\mathbold{A}\cdot\mathbold{x}$ is the closest point to $\mathbold{b}$ in the column span of $\mathbold{A}$ , under some distance measure, e.g., the Euclidean distance (which is also called the least squares distance or the $L2$ norm). In other words, we want to solve the following problem:

[TABLE]

or the equivalent problem:

[TABLE]

There is a long history of research on the regression problem for static matrix data and graph data [2]. All the current best bounds for graph regression are by using a matrix embedding (representation) for the graph. As an example motivating (linear) graph regression using a matrix embedding for the graph, assume that we are given the graph of a friendship network in which a score is assigned to each node reflecting its reputation or weight or importance etc. Suppose that we want to find a function with a minimum error for the scores of the nodes, which is linear in terms of their structural properties. Hence, first we need to find a matrix embedding $\mathbold M$ of the nodes, in which each row $i$ represents the structural properties of node $i$ . Then we need to find a function for the scores which is linear in terms of the values in the rows of $\mathbold M$ .

Since most of real-world graphs are dynamic, recently the problem was extended to dynamic graphs [8, 22]. Dynamic graphs are graphs that change over time by a sequence of update operations. They are generated in many domains such as the world wide web, social and information networks, technology networks and communication networks. An update operation in a graph might be either an edge insertion or an edge deletion or a node insertion or a node deletion.

Given a $n\times d$ (update-efficient) matrix embedding111Note that this notion of embedding is different from the notion of embedding used in graph pattern mining [10, 11, 12]. of a graph $G$ , the author of [8] proposed an exact algorithm for dynamic graph regression, wherein first an $O\left(\min\left\{nd^{2},n^{2}d\right\}\right)$ time pre-processing is performed. Then after any update operation in the graph, the solution is updated in $O(nd)$ time. However, since in most of applications $n$ is a very large quantity, this time complexity might be too high to be used in practice. Therefore, we are interested in developing algorithms that are considerably faster than the exact algorithm, at the expense of producing an approximate solution. In particular, we want to develop algorithms that have a sublinear update time, in terms of $n$ .

To do so, in the current paper we utilize two sketching techniques, namely subsampled randomized Hadamard transform [1] and CountSketch [17], to develop sublinear update time randomized algorithms for the dynamic graph regression problem:

•

(Theorem 3 and Corollary 1). Let $r$ be a quantity that indicates the number of samples required for a $1\pm\epsilon$ approximation, as defined in Equations 6 and 9 of Theorem 1. Our first randomized algorithm is based on subsampled randomized Hadamard transform and supports edge insertion and edge deletion. It updates the approximate solution in $O(rd)$ time. With $d\ll n$ and in particular considering $d$ as a constant, this yields a sublinear update time.

•

(Theorems 5, 6 and 7, and Corollary 2). Let $q=O\left(\frac{d^{2}}{\epsilon^{2}}\log^{6}(d/\epsilon)\right)$ be the number of samples required for a $1\pm\epsilon$ approximation, using CountSketch. Our second randomized algorithm uses CountSketch and updates the approximate solution in $O(qd)$ time. Therefore if $d$ and $\epsilon$ are considered as constants, it yields a constant update time randomized algorithm. Unlike our first algorithm, our second algorithm supports all the update operations edge insertion, edge deletion, node insertion and node deletion.

Note that subsampled randomized Hadamard transform and CountSketch have already been used to improve regression in static data [1, 20, 5, 17]. However, in this paper for the first time we show how they can be used to improve update time in a dynamic setting, where it is required to update the sketches and the approximate solution, after an update operation in the data.

While our randomized algorithms considerably improve update time upon the exact algorithm, we also analyze their relative performance. We show that under some assumptions, if $\ln n<\epsilon^{-1}$ our first algorithm outperforms our second algorithm and if $\ln n\geq\epsilon^{-1}$ our second algorithm reveals a better update time.

The rest of this paper is organized as follows. In Section 2, we present preliminaries and necessary background and definitions used in the paper. In Section 3, we provide an overview on related work. In Section 4, we briefly introduce subsampled randomized Hadamard transform and CountSketch. In Section 5, we present our first randomized algorithm for the dynamic graph regression problem, which is based on subsampled randomized Hadamard transform222Parts of the results discussed in Section 5 were presented in Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), pp. 2045-2048 [9].. In Section 6, we introduce our second randomized algorithm, which is based on CountSketch. We discuss and compare our proposed algorithms in Section 7. Finally, the paper is concluded in Section 8.

2 Preliminaries

In this paper, we use the following standard for notations and symbols: lowercase letters for scalars, uppercase letters for constants and graphs, bold lowercase letters for vectors and bold uppercase letters for matrices. By $G$ we refer to a graph which is simple and unweighted. We use $n$ to denote the number of nodes of $G$ . We define a dynamic graph as a graph that changes over time by a sequence of update operations. The adjacency matrix of $G$ is a square $n\times n$ matrix such that its element in row $i$ column $j$ is $1$ iff there exists an edge from node $i$ to node $j$ (and [math] if there is no such an edge). We define the distance between node $u$ and node $v$ , denoted by $dist(u,v)$ , as the size, i.e., the number of edges of a shortest path connecting $u$ to $v$ .

Let $\mathbold{A}\in\mathbb{R}^{n\times d}$ . The rank of $\mathbold{A}$ is defined as the maximum number of its linearly independent column vectors. The transpose of $\mathbold{A}$ , denoted with $\mathbold{A}^{*}$ , is defined as an operator that switches the row and column indices of $\mathbold{A}$ . The Singular Value Decomposition (SVD) of a $n\times d$ matrix $\mathbold{A}$ is defined as $\mathbold{U}\cdot\mathbold{\Sigma}\cdot\mathbold{V}^{*}$ , where $\mathbold{U}$ is a $n\times d$ matrix with orthonormal columns, $\mathbold{\Sigma}$ is a $d\times d$ diagonal matrix with non-negative non-increasing entries down the diagonal, and $\mathbold{V}^{*}$ is a $d\times d$ matrix with orthonormal rows. The Euclidean norm or $L_{2}$ norm of a vector $\mathbold{x}$ of size $n$ , denoted with $||\mathbold{x}||_{2}$ , is defined as $\sqrt{\mathbold{x}_{1}^{2}+\cdots+\mathbold{x}_{n}^{2}}$ .

The Moore-Penrose pseudoinverse of matrix $\mathbold{A}=\mathbold{U}\cdot\mathbold{\Sigma}\cdot\mathbold{V}^{*}$ , denoted with $\mathbold{A}^{\dagger}$ , is the $d\times n$ matrix $\mathbold{V}\cdot\mathbold{\Sigma}^{{\dagger}}\cdot\mathbold{U}^{*}$ , where $\mathbold{\Sigma}^{{\dagger}}$ is a $d\times d$ diagonal matrix defined as follows: $\mathbold{\Sigma}^{{\dagger}}[i,i]=1/\mathbold{\Sigma}[i,i]$ , if $\mathbold{\Sigma}[i,i]>0$ and [math] otherwise. It is well-known that the solution

[TABLE]

is an optimal solution for Equation 1 and it has minimum $L2$ norm [36].

The approximate version of the regression problem is defined as

[TABLE]

where $\mathbold{x}$ is the optimal solution, defined in Equation 2, and $\epsilon\in(0,1)$ defines the desired accuracy. As we will see in Section 4, sketching techniques can be used to solve this approximate version.

3 Related work

In recent years, a number of algorithms have been proposed for different learning problems over the entire graphs [14, 37, 13] or nodes of a graph [31, 29, 8]. Kleinberg and Tardos [31] studied the classification problem for nodes of a static graph and showed the connection of their general formulation to Markov random fields. Herbster and Pontil [29] studied the problem of online label prediction of a graph with the perceptron. The key difference between online setting [30, 27, 26, 28] and dynamic setting is that online setting is used when it is computationally infeasible to solve the learning problem over the entire dataset. However, in dynamic setting the learning problem can be solved over the entire dataset and the challenge is to efficiently update the solution when the dataset changes. Culp, Michailidis and Johnson [18] presented representative multi-dimensional view smoothers on graphs that are based on graph-based transductive learning [40]. The authors of [4] proposed a family of learning algorithms based on a new form of regularization so that some of transductive graph learning algorithms can be obtained as special cases. Kovac and Smith [2] extended a model for nonparametric regression of nodes of a static graph, where the distance between estimate and observation is measured by $L_{2}$ norm. Chehreghani [8] studied regression over dynamic graphs. He proposed an exact algorithm for updating the optimal solution of the problem, where the update time is at least linear in terms of the number of nodes. In the current paper, we present randomized algorithms with sublinear update times.

A research problem that may have some connection to our studied problem is learning embeddings or representations for nodes of a graph [23], [39], [34]. While this problem has become more attractive in recent years, it dates back to several decades ago. For example, Parsons and Pisanski [35] presented vector embeddings for nodes of a graph such that the inner product of the vector embeddings of any two nodes $i$ and $j$ is negative iff $i$ and $j$ are connected by an edge; and it is [math] otherwise.

In the literature, there also exist several updating algorithms for different problems over dynamic graphs. Durfee et.al. [21] presented a sublinear update time randomized algorithm to approximate effective resistances. The effective resistance between two nodes is the electrical resistance seen between the nodes of a resistor network where edge weights form conductances. Their algorithm supports edge insertion/deletion and yields a $1\pm\epsilon$ approximation with a probability at least $1-1/poly(n)$ . Durfee et.al. [22] gave algorithms for updating Schur complements of general graphs, that support edge insertion/deletion and node insertion. Their algorithms maintain at any time a $1\pm\epsilon$ approximation to the Schur complement. The authors also presented a sublinear update time algorithm for least squares regression of bounded-degree graphs with gradual changes in $\bm{b}$ . However in the current paper, we present the first sublinear update time algorithms for the general class of graphs with unbounded degrees, that captures most important real-world networks. Chen et.al. [16] developed a technique to reduce optimization problems based on undirected graphs to finding a data-structure notion of node sparsifiers. Using this technique, they presented a sublinear update time algorithm for flows. An overview on a number of update algorithms for different machine learning problems can be found in [24].

4 Sketching techniques

In this section, we briefly describe subsampled randomized Hadamard transform and CountSketch. Let $\mathbold A$ be a $n\times d$ matrix. A subsampled randomized Hadamard transform for $\mathbold A$ is defined as $\mathbold P\cdot\mathbold H\cdot\mathbold D$ , where

•

matrix $\mathbold D$ is a $n\times n$ diagonal matrix with $\pm 1$ on the diagonal (each one with the same probability),

•

matrix $\mathbold H$ is a $n\times n$ Hadamard matrix, and

•

matrix $\mathbold P$ is a $r\times n$ matrix that samples $r$ rows of $\mathbold P\cdot\mathbold H$ uniformly with replacement. If in the $j^{th}$ sample row $i$ is selected, $\mathbold P[j,i]=\frac{\sqrt{n}}{\sqrt{r}}$ ; otherwise, it is [math].

For $n=2^{k}$ , the $n\times n$ Hadamard matrix $\mathbold H$ is defined as follows: $\mathbold H[i,j]=\frac{(-1)^{\langle i,j\rangle}}{\sqrt{n}},$ where $\langle i,j\rangle$ is the dot product of the binary representations of $i$ and $j$ over the field $\mathbb{F}_{2}$ .

A CountSketch for the $n\times d$ matrix $\mathbold A$ is a $q\times n$ matrix $\mathbold S$ ( $q$ is defined in Theorem 2), defined as follows: for every column, a single nonzero entry is chosen uniformly at random, which takes values $\pm 1$ with equal probability [17]. Therefore, $\mathbold S$ is a sparse matrix which has only $n$ nonzero elements. Moreover, $\mathbold S\cdot\mathbold A$ can be computed in a time proportional to the number of nonzero elements of $\mathbold A$ [17].

The high level procedure of solving regression using sketching (either subsampled randomized Hadamard transform or CountSketch) is as follows:

•

Compute a sketching matrix $\mathbold S$ (either a $\mathbold P\cdot\mathbold H\cdot\mathbold D$ matrix or a CountSketch matrix),

•

Compute matrices $\mathbold S\cdot\mathbold A$ and $\mathbold S\cdot\mathbold b$ ,

•

Compute and output the solution of the equation

[TABLE]

The solution of Equation 4 is

[TABLE]

which we call the approximate solution. When $\mathbold S$ is defined as a $\mathbold P\cdot\mathbold H\cdot\mathbold D$ matrix, Theorem 1 states the number of samples (the number of rows of $\mathbold P$ ) that are sufficient for producing a $1\pm\epsilon$ approximation to the optimal solution.

Theorem 1 (Theorem 2 (and the remark afterwards) of

[20]).

Suppose $\mathbold A\in\mathbb{R}^{n\times d}$ , $\mathbold b\in\mathbb{R}^{n}$ , and let $\epsilon\in(0,1)$ . If

[TABLE]

with a probability at least $0.8$ , we have:

[TABLE]

Time complexity of computing optimal ${\mathbold x^{\prime}}$ , i.e., the approximate solution, is

[TABLE]

In particular, assuming that $d\leq n\leq e^{d}$ , we get:

[TABLE]

and the time complexity becomes:

[TABLE]

When $\mathbold S$ is defined as a CountSketch matrix, Theorem 2 expresses time complexity of the procedure of computing a $1\pm\epsilon$ approximation to the optimal solution.

Theorem 2 (Theorem 30 of

[17]).

Suppose that $\mathbold A\in\mathbb{R}^{n\times d}$ , $\mathbold b\in\mathbb{R}^{n}$ and $\epsilon\in(0,1)$ . Using a $q\times n$ CountSketch with

[TABLE]

a $1\pm\epsilon$ approximation to the optimal solution of linear regression over $\mathbold A$ and $\mathbold b$ can be solved with a probability at least $2/3$ in

[TABLE]

time, where $nnz(\mathbold A)$ is the number of nonzero elements of $\mathbold A$ .

5 Dynamic graph regression using

subsampled randomized Hadamard transform

In this section, we utilize subsampled randomized Hadamard transform to improve update time of dynamic graph regression, at the cost of having a $1\pm\epsilon$ approximation to the optimal solution. We here restrict ourselves to the following update operations: i) edge deletion, wherein an edge is deleted from the graph, and ii) edge insertion, wherein an edge is inserted between two nodes of the graph. We refer to these operations as edge-related update operations. The reason that in this section we do not consider node insertion and node deletion is that as we will see later, they require to change (the size of) the used Hadamard matrix $\mathbold H$ , which requires $\Theta(n)$ time. Hence and since we are looking for algorithms that have a sublinear update time, we do not consider these two operations.333 Moreover, a property of real-world graphs is densification [32], i.e., their number of edges grows superlinearly in the number of their nodes. Therefore, we may say that most of update operations in a dynamic graph are related to edges, rather than to nodes. As a result, proposing algorithms that are efficient for edge-related update operations is useful and worthwhile. For node insertions/deletions, we may compute the solution from scratch, whose time complexity is not much worse than linear in $n$ (see Equation 13 of Corollary 1).

Before starting our proofs (and algorithms), we note two intrinsic limitation of randomized Hadamard transform: i) $\mathbold H$ (respectively graph $G$ ) must have a power of $2$ rows/columns (respectively nodes), ii) the matrix embedding $\mathbold M$ must have full rank. For now on, we forget these two limitations. We get back to them in Section 5.2.

We assume that the graph $G$ has an edge-update-efficient matrix embedding $\mathbold M$ , and we define the regression problem with respect to it. More precisely, we want to compute and update $(\mathbold S\cdot\mathbold M)^{{\dagger}}\cdot\mathbold S\cdot\mathbold b$ , where $\mathbold M$ is edge-update-efficient. Edge-update-efficient matrix embeddings are a superset of update-efficient matrix embeddings presented in [8]. The class of update-efficient embeddings characterizes those matrix embeddings for which the optimal solution of the graph regression problem can be updated efficiently [8]. For example, adjacency matrix of $G$ belongs to this class. Edge-update-efficient matrix embeddings, defined in Definition 1, characterize those matrix embeddings for which the approximate solution can be updated efficiently, when the updated operation is edge-related.

Definition 1.

Let $\mathbold{M}$ be a $n\times d$ matrix embedding of a graph $G$ and $f$ be a complexity function. We say $\mathbold{M}$ is edge-update-efficientf*, if it satisfies the following condition: if $\mathbold{M}$ and $\mathbold{M^{\prime}}$ are the correct matrix embeddings before and after one of the edge-related update operations, there exist at most $K$ pairs of vectors $\mathbold{c^{k}}$ and $\mathbold{d^{k}}$ , with $K$ as a constant, such that:*

[TABLE]

Each vector $\mathbold{d^{k}}$ has size $d$ and each vector $\mathbold{c^{k}}$ has size $n$ wherein only one entry, whose position is known, is nonzero444So, $\mathbold{c^{k}}$ can be compactly stored by keeping only the position and the value of its nonzero entry.. We refer to each pair $\mathbold{c^{k}}$ and $\mathbold{d^{k}}$ as a pair of update vectors, and to $\sum_{k=1}^{K}\left(\mathbold{c^{k}}\cdot\mathbold{d^{k}}^{*}\right)$ as the update matrix. Also, it is feasible to compute all pairs of update vectors in $O(f)$ time. When function $f$ is clear from the context, we drop it.

At the high level, our algorithm consists of two phases: the pre-processing phase wherein we assume that we are given a static graph and we find an approximate solution for it, and the update phase, wherein after an edge-related update operation in $G$ , the already found approximate solution is revised to become valid for the new graph. During pre-processing, first we generate some matrices $\mathbold P$ , $\mathbold H$ and $\mathbold D$ , as defined in Section 4. Then we calculate $\mathbold M^{\prime}=\mathbold P\cdot\mathbold H\cdot\mathbold D\cdot\mathbold M$ . Then, we compute $\mathbold b^{\prime}=\mathbold P\cdot\mathbold H\cdot\mathbold D\cdot\mathbold b$ . Then, we compute ${\mathbold M^{\prime}}^{{\dagger}}$ and finally, we compute ${\mathbold M^{\prime}}^{{\dagger}}\cdot\mathbold b^{\prime}$ . Time complexity of the algorithm is stated in Theorem 1. In the following, first in Section 5.1 we discuss how the approximate solution can be updated, after an edge-related operation. Then, in Section 5.2 we discuss how the limitations of the used technique can be addressed. The presented proofs are constructive.

5.1 The update algorithm

In this section, we assume that the update operation is an edge-related operation and show that the approximate solution, i.e., the value depicted in Equation 5, can be updated in $O(rd)$ time. Here, we condition on the existence of an edge-update-efficient matrix embedding, without emphasizing any specific one. Later in Section 5.2, we show that this condition holds.

Theorem 3.

Let $\mathbold M$ be a $n\times d$ edge-update-efficient matrix embedding of graph $G$ . Suppose that using a $r\times n$ subsampled randomized Hadamard transform $\mathbold S$ , a $1\pm\epsilon$ approximation to the optimal solution of graph regression over $G$ is already computed. Then, after an edge insertion or an edge deletion, the $1\pm\epsilon$ approximation can be updated in $O(rd)$ time.

Proof.

After one of the above-mentioned update operations, by the edge-update-efficient property of $\mathbold M$ , $\mathbold M$ can be updated by at most $K$ pairs of update vectors for the revised graph. Given these at most $K$ pairs of update vectors and $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ of the graph before the update operation, we want to compute $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ of the revised graph. Since the number of columns and the number of rows of $\mathbold{M}$ do not change, the sketching matrix $\mathbold S$ does not change, too. We have a sequence of at most $K$ rank- $1$ updates $\mathbold{M^{k+1}}=\mathbold{M^{k}}+\mathbold{c^{k}}\cdot{\mathbold{d^{k}}}^{*}$ , $1\leq k<K$ , where $\mathbold{c^{k}}$ and ${\mathbold{d^{k}}}$ are a pair of update vectors, $\mathbold{M^{1}}=\mathbold{M}$ and $\mathbold{M^{K}}$ is the correct matrix embedding of $G$ after the update operation. After each rank- $1$ update $\mathbold{M^{k+1}}=\mathbold{M^{k}}+\mathbold{c^{k}}\cdot{\mathbold{d^{k}}}^{*}$ ,

•

given the matrix $\mathbold{S}\cdot\mathbold{M^{k}}$ , we first compute $\mathbold{S}\cdot\mathbold{c^{k}}\cdot\mathbold{d^{k}}^{*}$ and then, we compute $\mathbold{S}\cdot\mathbold{M^{k+1}}$ by computing the matrix summation $\mathbold{S}\cdot\mathbold{M^{k}}+\mathbold{S}\cdot\mathbold{c^{k}}\cdot\mathbold{d^{k}}^{*}$ . Note that $\mathbold{S}\cdot\mathbold{c^{k}}\cdot\mathbold{d^{k}}^{*}$ can be computed in $O\left(rd\right)$ time, as follows. First, we compute $\mathbold{S}\cdot\mathbold{c^{k}}$ by taking into account only the $i^{th}$ column of $\mathbold S$ , where $i$ is the sole nonzero entry of $\mathbold{c^{k}}$ . The result is a vector $\mathbold{s^{k}}$ of size $r$ . Second, we compute the vector product $\mathbold{s^{k}}\cdot\mathbold{d^{k}}^{*}$ , which can be done in $O\left(rd\right)$ time.

•

then, we exploit the algorithm of Meyer [7] that given a $n_{1}\times n_{2}$ matrix $\mathbold{A}$ and its Moore-Penrose pseudoinverse $\mathbold{A}^{{\dagger}}$ and a pair of update vectors $\mathbold{c}$ and $\mathbold{d}$ , computes the Moore-Penrose pseudoinverse of $(\mathbold{A}+\mathbold{c}\cdot\mathbold{d}^{*})$ , in $O(n_{1}n_{2})$ time555Instead of Meyer’s algorithm [7], we can also use the general reduction technique of van den Brand [38] that maintains several operations (including pseudoinverse) on dynamic matrices, by maintaining only one specific matrix inverse. . Here, our matrix $\mathbold A$ is $\mathbold S\cdot\mathbold M$ which is a $r\times d$ matrix, therefore updating $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ for a given pair of update vectors will take $O\left(rd\right)$ time.

Therefore and after repeating this procedure for at most $K$ times, we can compute the Moore-Penrose pseudoinverse of $\mathbold S\cdot\mathbold M$ for the updated graph in $O\left(Krd\right)=O\left(rd\right)$ time. In the end, multiplication of the updated $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with $(\mathbold S\cdot\mathbold b)$ yields the approximate solution of the updated graph, which can be done in $O(rd)$ time. ∎

5.2 Addressing the limitations

The first intrinsic limitation of randomized Hadamard transform is that the number of rows in $\mathbold M$ , i.e., $n$ , must be a power of $2$ . This implies that we should always have a power of $2$ nodes in the graph. When applying randomized Hadamard transform to matrices, this issue is addressed by concatenating a zero matrix to the main matrix that makes its size a power of $2$ [33, 19]. We can follow a similar strategy for graphs. More precisely, if during pre-processing the number of rows of $\mathbold M$ is less than a power of $2$ , we pad it with zeros up to the next larger power of $2$ . This might be seen as adding isolated nodes to the graph, with measured values [math], to make its size a power of $2$ . The second intrinsic limitation of randomized Hadamard transform is that $\mathbold M$ must be a full rank matrix. However, this might not be a serious problem as many real-world matrices have a full rank.

The next restriction is that the $n\times d$ matrix embedding $\mathbold M$ must satisfy two properties. First, $d\ll n$ , because otherwise, randomized Hadamard transform will not be efficient. Second, it must be edge-update-efficient. In the following, first in Definition 2, we present a matrix embedding defined based on the $d$ closest nodes of each node, where $d$ can be arbitrarily small (we consider it as a small constant). So it satisfies the first property. Then in Theorem 4, we prove that it is an edge-update-efficient matrix embedding. For the sake of simplicity, we assume that $G$ is undirected. The extension of the results to directed graphs is straightforward.

Definition 2.

For each node $v$ in a graph $G$ , we define its vector embedding as a vector consisting of $d$ nodes of $G$ that have the smallest distances to $v$ , and call it the $d$ -nearest neighborhood of $v$ . If there are several such subsets of $V(G)$ , we choose an arbitrary one. We define matrix embedding $\mathbold M$ of $G$ as a $n\times d$ matrix whose $i^{th}$ row is the vector embedding of node $i$ .

Lemma 1.

If node $u$ is reachable from node $v$ (i.e., there is a path from $v$ to $u$ ) but their distance is larger than $d$ , $u$ cannot be in $d$ -nearest neighborhood of $v$ .

Proof.

If $u$ and $v$ are connected by a path but $dist(u,v)>d$ , there exist at least $d$ nodes in the graph, such that their distances to $v$ are less than $dist(u,v)$ . Therefore, $u$ is not in $d$ -nearest neighborhood of $v$ . ∎

Lemma 2.

If an edge is inserted between nodes $u$ and $v$ of graph $G$ , vector embeddings of at most $O(d^{d})$ nodes of $G$ may change. Furthermore, each vector embedding that must be revised, can be updated in $O(d^{2})$ time.

Proof.

First, we determine those nodes that after inserting an edge between $u$ and $v$ , may have a change in their $d$ -nearest neighborhood. Let $Q$ denote the set of such nodes. Nodes $u$ and $v$ belong to $Q$ . Also, those nodes that have already node $u$ (resp. node $v$ ) in their $d$ -nearest neighborhood, after inserting an edge between $u$ and $v$ may find $v$ (resp. $u$ ) and some other nodes in their $d$ -nearest neighborhood. Lets focus on finding those nodes that have already $u$ in their $d$ -nearest neighborhood, and may have $v$ in their $d$ -neighborhood after the edge insertion (finding those nodes that may have $u$ in their $d$ -neighborhood after the edge insertion can be done in a similar way). To do so, we conduct a breadth-first search (BFS) from $v$ on the updated graph. We use the following pruning/stopping criterion’s:

•

at the first level, among all neighbors of $v$ , we meet only $u$ . The reason is that we are interested in finding those nodes that have a shortest path to $v$ passing over $u$ .

•

in other levels, if a node $x$ has a degree greater than $d$ , $v$ cannot be in the $d$ -nearest neighborhood of any of its adjacent nodes (and also any node $y$ such that $x$ is on a shortest path between $y$ and $v$ ). Because the adjacent nodes of $x$ have already at least $d$ nodes that are closer to them than $v$ .

•

if a node $x$ has a distance greater than $d$ from $v$ , as Lemma 1 says, $v$ cannot be in its $d$ -nearest neighborhood. Furthermore, any node $y$ such that $v$ is on a shortest path from $x$ to $y$ cannot be in the $d$ -nearest neighborhood of $x$ . Hence, those nodes that have a distance greater than $d$ from $v$ should not be traversed during the BFS.

As a result and in the end of the traversal, all the met nodes have a degree at most $d$ and a distance to $v$ at most $d$ . The number of such nodes is at most $O(d^{d})$ .

Second, from each node whose vector embedding may require an update, we conduct a BFS on its first $d$ nodes to compute its updated embedding. This can be done in $O(d^{2})$ time. ∎

Lemma 3.

If the edge between nodes $u$ and $v$ of a graph $G$ is deleted, vector embeddings of at most $O(d^{d})$ nodes change. Furthermore, each vector embedding that should be revised, can be updated in $O(d^{2})$ time.

Proof.

Our proof is similar to the proof of Lemma 2. First, we determine those nodes that after deleting the edge between $u$ and $v$ , may have a change in their neighborhood. Let $Q$ denote the set of such nodes. Nodes $u$ and $v$ belong to $Q$ . Also, those nodes that have already node $u$ (resp. node $v$ ) in their $d$ -nearest neighborhood, after deleting the edge between $u$ and $v$ , may also loose $v$ (resp. $u$ ) and some other nodes from their $d$ -nearest neighborhood. Lets focus on finding those nodes that have already $u$ in their $d$ -nearest neighborhood, and may loose $v$ and some other nodes from their $d$ -neighborhood (finding those nodes that may loose $u$ from their $d$ -neighborhood can be done in a similar way). We conduct a BFS from $v$ on the graph before the edge deletion. We use the three pruning/stopping criterion’s used in the proof of Lemma 2. In the end of the traversal, all the met nodes have a degree at most $d$ and a distance to $v$ at most $d$ . The number of such nodes is at most $O(d^{d})$ .

Second, from each node whose embedding may require an update, we conduct a BFS on its first $d$ nodes in the updated graph, to compute its updated embedding. This can be done in $O(d^{2})$ time. ∎

Theorem 4.

Assuming that $d$ is a constant, the matrix embedding $\mathbold M$ defined in Definition 2 is an edge-update-efficient1 matrix embedding.

Proof.

We show that $\mathbold M$ satisfies the conditions stated in Definition 1. When an edge is inserted/deleted between nodes $i$ and $j$ , as Lemmas 2 and 3 say, vector embeddings of at most $O(d^{d})$ nodes change and it take $O(d^{2})$ time to update each vector embedding. Since $d$ is a constant, we can consider $d^{d+2}$ as a constant. For each node $v$ whose vector embedding has been changed, we define a pair of update vectors $\mathbold{c}$ and $\mathbold d$ as follows: $\mathbold d$ contains the new vector embedding of $v$ minus its old vector embedding; and the position and the value of the nonzero entry of $\mathbold c$ are respectively set to $v$ and $1$ . Therefore, the conditions of Definition 1 are satisfied and $\mathbold M$ is an edge-update-efficient1 matrix embedding. ∎

Corollary 1.

Suppose that we are given a graph $G$ whose matrix embedding is defined as Definition 2, with $d$ as a small constant, and it is a full rank matrix. Our first randomized algorithm, which is based on subsampled randomized Hadamard transform, performs the pre-processing phase in

[TABLE]

time. Then, after an edge insertion or an edge deletion, it updates a $1\pm\epsilon$ approximation to the optimal solution of graph regression in

[TABLE]

time.

Proof.

In Theorem 3, we conditioned on the existence of an edge-update-efficient embedding and showed that it takes $O(rd)$ time to update the approximate solution. Then in Theorem 4, we showed that this matrix embedding does exist. Therefore and by using the value of $r$ presented in Theorem 1 and discarding constants (including $d$ ), we obtain the time complexities stated in the theorem. ∎

We note that when computing embeddings for nodes of a graph, the objective is to map each node to a vector in a low dimensional space [15, 6]. In this way, many embedding computation methods such those that are based on random-walks [23] and deep graph neural networks [25], consider only a small neighborhood for each node. Therefore, it is reasonable to consider $d$ as a small constant. We also note that if the exact algorithm of [8] uses the matrix embedding presented in Definition 2, it will yield a linear time algorithm (in terms of $n$ ) for updating the solution, which is considerably worse than the sublinear update time presented in Equation 14.

6 Dynamic graph regression using CountSketch

In this section, we utilize CountSketch to develop our second randomized algorithm for the dynamic graph regression problem. Unlike our first algorithm, it works for all the update operations: i) node insertion, wherein a node is inserted into the graph and at most a constant number of edges are drawn between it and the existing nodes of the graph, ii) node deletion, wherein a node that has at most a constant number of edges, is deleted from the graph and its incident edges are deleted, too, iii) edge deletion wherein an edge is deleted from the graph, and iv) edge insertion wherein an edge is inserted into the graph.

We assume that a $n\times d$ matrix embedding exists which satisfies the following conditions: i) $d$ is fixed and does not depend on the number of data rows $n$ (as a result, by changing the number of data rows, $d$ does not change), and ii) the matrix embedding is a CUE embedding. CUE666CUE is abbreviation for CountSketch-based Update-Efficient matrix embedding. characterizes a class of matrix embeddings for which we can efficiently update the approximate solution of graph regression, using CountSketch. It is less general then edge-update-efficient matrix embeddings presented in Section 5, which can be used for only edge-related operations.

Definition 3.

Let $\mathbold{M}$ be a $n\times d$ matrix embedding of a graph $G$ and $f$ be a (complexity) function of $n$ and $d$ . We say $\mathbold{M}$ is CUEf, iff the following conditions are satisfied:

if $\mathbold{M}$ and $\mathbold{M^{\prime}}$ are correct matrix embeddings before and after an edge insertion/deletion in the graph, there exist at most $K$ pairs of vectors $\mathbold{c^{k}}$ and $\mathbold{d^{k}}$ , with $K$ as a constant, such that:

[TABLE]

*Each vector $\mathbold{d^{k}}$ has size $d$ and each vector $\mathbold{c^{k}}$ has size $n$ wherein only one entry, whose position is known, is nonzero. * 2. 2.

a node insertion in $G$ results in adding one row to $\mathbold{M}$ and also (at most) a rank- $K$ * update matrix in $\mathbold{M}$ .* 3. 3.

deleting a node from $G$ results in deleting one row from $\mathbold{M}$ and also (at most) a rank- $K$ * update matrix in $\mathbold{M}$ .* 4. 4.

*after any update operation in $G$ , it is feasible to compute all pairs of update vectors in $O(f(n,d))$ time. *

When $f$ is clear from the context, we drop it.

Similar to the case of subsampled randomized Hadamard transform, during the pre-processing phase of our CountSketch-based algorithm and for a given $\epsilon$ , first we generate a $q\times d$ matrix $\mathbold S$ , as defined in Section 4. Then we calculate $\mathbold{M^{\prime}}=\mathbold S\cdot\mathbold M$ and $\mathbold{b^{\prime}}=\mathbold S\cdot\mathbold b$ . Finally, we compute $\mathbold{M^{\prime}}^{{\dagger}}$ and $\mathbold{M^{\prime}}^{{\dagger}}\cdot\mathbold{b^{\prime}}$ . Time complexity of the procedure is given in Theorem 2. In the following, first in Section 6.1 we discuss how the approximate solution is updated, after an update operation. Then, in Section 6.2 we discuss the existence of a CUE matrix embedding.

6.1 The update algorithm

In this section, we assume that we are given a matrix $\mathbold M$ that satisfies the two above mentioned conditions and show, using CountSketch, how the approximate solution is efficiently updated after an update operation.

6.1.1 Edge insertion/deletion

In this section, we assume that the update operation is either an edge insertion or an edge deletion. Then, we show that the approximate solution can be updated in $O(qd)$ time.

Theorem 5.

Assume that $\mathbold M$ is a $n\times d$ CUE matrix embedding of graph $G$ . Suppose also that using a $q\times n$ CountSketch $\mathbold S$ with $q$ defined in Equation 11, a $1\pm\epsilon$ approximation to the solution of graph regression of $G$ is already computed. Then, after an edge insertion or an edge deletion, the approximate solution can be updated in $O(qd)$ time.

Proof.

The proof is similar to the proof of Theorem 3. Since $\mathbold M$ is a CUE matrix embedding, after an edge insertion or an edge deletion, $\mathbold M$ is updated by at most $K$ pairs of update vectors. Since the number of columns of $\mathbold{M}$ does not change, matrix $\mathbold S$ does not change, too. Therefore, we have a sequence of at most $K$ rank- $1$ updates $\mathbold{M^{k+1}}=\mathbold{M^{k}}+\mathbold{c^{k}}\cdot{\mathbold{d^{k}}}^{*}$ , $1\leq k<K$ , where $\mathbold{c^{k}}$ and ${\mathbold{d^{k}}}$ are a pair of update vectors, $\mathbold{M^{1}}=\mathbold{M}$ and $\mathbold{M^{K}}$ is the correct matrix embedding of $G$ after the update operation. After each rank- $1$ update $\mathbold{M^{k+1}}=\mathbold{M^{k}}+\mathbold{c^{k}}\cdot{\mathbold{d^{k}}}^{*}$ , given the matrix $\mathbold{S}\cdot\mathbold{M^{k}}$ , similar to the proof of Theorem 3, we can compute $\mathbold{S}\cdot\mathbold{M^{k+1}}$ in $O(qd)$ time. Then we can use Meyer’s algorithm [7] to update $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ , for a given pair of update vectors, in $O\left(qd\right)$ time.

After repeating this procedure for at most $K$ times, we can compute the Moore-Penrose pseudoinverse of $\mathbold S\cdot\mathbold M$ for the updated graph in $O\left(qd\right)$ time. Finally, multiplication of the updated $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with $(\mathbold S\cdot\mathbold b)$ can generate, in $O(qd)$ time, the approximate solution. ∎

6.1.2 Node insertion

In this section, we assume that the update operation is a node insertion and show how the approximate solution can be effectively updated.

Theorem 6.

Let $\mathbold M$ be a $n\times d$ CUE matrix embedding of graph $G$ . Suppose that using a $q\times n$ CountSketch $\mathbold S$ with $q$ defined in Equation 11, a $1\pm\epsilon$ approximation to the solution of graph regression of $G$ is already computed. Then, after inserting a node into $G$ , the approximate solution can be updated in $O(qd)$ time.

Proof.

After inserting a node into the graph, we need to revise matrices $\mathbold S$ and $\mathbold M$ . Matrix $\mathbold M$ is revised because we need to add to $\mathbold M$ the row corresponding to the new node. Matrix $\mathbold S$ is revised because its number of columns is a function of the number of rows of $\mathbold M$ . Therefore and as a result of a node insertion, we add a new column to $\mathbold S$ and we choose a row uniformly at random as its nonzero element. Let $i$ be the index of this nonzero row. To update $\mathbold S\cdot\mathbold M$ with respect to this change, we add to each entry $j$ of the $i^{th}$ row of $\mathbold S\cdot\mathbold M$ the value of the $j^{th}$ entry of the last row of $\mathbold M$ . This can be done in $O(d)$ time. Furthermore, by the CUE property of $\mathbold M$ , as a result of this node insertion, the vector embeddings of the other nodes change by at most $K$ pairs of update vectors. Since $q$ and $d$ do not change, the size of matrix $\mathbold S\cdot\mathbold M$ does not change, too. Updating $\mathbold S\cdot\mathbold M$ with respect to these at most $K$ pairs of update vectors can be done in $O(qd)$ time (as described in the proofs of Theorems 3 and 5).

To update $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with respect to the changes in $\mathbold S\cdot\mathbold M$ , we can exploit the algorithm of Meyer [7]. Since the changes in $i^{th}$ row of $\mathbold S\cdot\mathbold M$ can be expressed in terms of a pair of update vectors, $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ can be updated with respect to them in $O(qd)$ time. Furthermore, for each of at most $K$ pairs of update vectors, we can use the algorithm of Meyer [7] to update $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ in $O(qd)$ time.

After node insertion, we need also to append the measured value of the new node to the bottom of $\mathbold b$ and then, update $\mathbold S\cdot\mathbold b$ (with respect to the revised $\mathbold S$ ). To update $\mathbold S\cdot\mathbold b$ , it is sufficient to add the measured value of the new node to the $i^{th}$ entry of $\mathbold S\cdot\mathbold b$ ( $i$ is the nonzero row of the new column of the updated $\mathbold S$ ). In the end, a naive multiplication of the updated $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with the updated $\mathbold S\cdot\mathbold b$ gives the approximate solution for the updated graph, and it can be done in $O(qd)$ time. ∎

6.1.3 Node deletion

In this section, we assume that the update operation is node deletion, and show, in Theorem 7, how the approximate solution is effectively updated.

Theorem 7.

Let $\mathbold M$ be a $n\times d$ CUE matrix embedding of graph $G$ . Suppose that using a $q\times n$ CountSketch $\mathbold S$ with $q$ defined in Equation 11, a $1\pm\epsilon$ approximation to the solution of graph regression of $G$ is already computed. Then, after deleting a node from $G$ , the approximate solution can be updated in $O(qd)$ time.

Proof.

After deleting a node from the graph, we need to revise matrices $\mathbold S$ and $\mathbold M$ . Matrix $\mathbold M$ is revised because we need to delete from it the row corresponding to the deleted node. Matrix $\mathbold S$ is revised because we should delete from it the the column corresponding to the deleted node. Let $i$ be the index of this nonzero row. To update $\mathbold S\cdot\mathbold M$ with respect to these changes, we subtract from each entry $j$ of the $i^{th}$ row of $\mathbold S\cdot\mathbold M$ the value of $\mathbold M[q,j]$ . This can be done in $O(d)$ time. Furthermore, by the CUE property of $\mathbold M$ , as a result of this node deletion, the vector embeddings of the other nodes may change by at most $K$ pairs of update vectors. Matrix $\mathbold S\cdot\mathbold M$ can be updated with respect to these changes in $O(qd)$ time. Since $q$ and $d$ do not change, the size of matrix $\mathbold S\cdot\mathbold M$ does not change, too.

To update $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with respect to these changes in $\mathbold S\cdot\mathbold M$ , we can again exploit the algorithm of Meyer [7]. Therefore, since the changes in the $i^{th}$ row of $\mathbold S\cdot\mathbold M$ can be expressed in terms of a pair of update vectors, $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ can be updated with respect to them in $O(qd)$ time. Also, for each of at most $K$ pairs of update vectors, we can use the algorithm of Meyer [7] to update $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ in $O(qd)$ time.

After node deletion, we need also to delete the measured value of the deleted node from $\mathbold b$ and then, update $\mathbold S\cdot\mathbold b$ . To update $\mathbold S\cdot\mathbold b$ , it is sufficient to subtract the measured value of the deleted node from the the $i^{th}$ entry of $\mathbold S\cdot\mathbold b$ , where $i$ is the nonzero entry of the deleted column. In the end, a naive multiplication of the updated $(\mathbold S\cdot\mathbold M)^{{\dagger}}$ with the updated $\mathbold S\cdot\mathbold b$ yields the approximate solution of the updated graph and it can be done in $O(qd)$ time. ∎

6.2 Existence of a CUE matrix embedding

In this section, we show that the $d$ -nearest neighborhood vector embedding presented in Section 5.2 satisfies all the conditions we are looking for. First of all, in this embedding $d$ is a small constant and does not depend on $n$ . Second, in Theorem 8 we show that it is CUE777More than these two conditions and similar to our first randomized algorithm, here our matrix embedding must be a full rank matrix. .

Theorem 8.

Assuming that $d$ is a constant, the matrix embedding $\mathbold M$ defined in Definition 2 of Section 5.2 is CUE1.

Proof.

We shall show that $\mathbold M$ satisfies all the conditions stated in Definition 3 (for $f=1$ ).

When an edge is inserted/deleted between nodes $i$ and $j$ , in a way similar to the proof of Theorem 4, we can show that condition (1) of Definition 3 is satisfied. 2. 2.

When a new node $i$ is added to $G$ , we add a new row for it in $\mathbold M$ , which contains its $d$ closest neighbors. Furthermore, since at most a constant number $C$ of edges are added between $i$ and existing nodes in $G$ and each edge insertion may change vector embeddings of at most $O(d^{d})$ nodes, vector embeddings of at most $O(Cd^{d})$ nodes change, which can be seen as a constant $K$ . Therefore and similar to the previous case, condition (2) of Definition 1 is satisfied. 3. 3.

When we delete a node from $G$ , we delete its corresponding row from $\mathbold M$ . Furthermore, since the deleted node may have at most a constant number $C$ of edges (that are deleted too), and each edge deletion may change vector embeddings of at most $O(d^{d})$ nodes, vector embeddings of at most $O(Cd^{d})$ nodes change, which is a constant $K$ . Hence and similar to the previous case, condition (3) of Definition 1 is satisfied. 4. 4.

For all the update operations, each pair of update vectors $\mathbold c$ and $\mathbold d$ can be computed in $O(d^{2})$ time. As a result, condition (4) of Definition 1 is satisfied.

∎

Corollary 2.

Suppose that we are given a graph $G$ whose matrix embedding is defined as Definition 2, with $d$ as a constant, and it is a full rank matrix. Using a CountSketch as the sketching matrix, we can perform the pre-processing phase in

[TABLE]

time. Then, after a node insertion or a node deletion or an edge insertion or an edge deletion, we can update the $1\pm\epsilon$ approximation to the solution of graph regression in $O\left(\frac{1}{\epsilon^{2}}\log^{6}(1/\epsilon)\right)$ time.

Proof.

In Theorem 5, we conditioned on the existence of a CUE matrix embedding and showed that it takes $O(qd)$ time to update the approximate solution. Then in Theorem 4, we showed the existence of this matrix embedding. As a result and by replacing $q$ with its value defined in Equation 11 and discarding all constants (including $d$ ), we obtain the time complexities stated in the theorem. ∎

7 Discussion

When $d\ll n$ , both of our randomized algorithms outperform the exact algorithm of [8], in terms of pre-processing and update times. However, we shall also compare the two randomized algorithms against each other.

•

Suppose that our randomized algorithms use $d$ -nearest neighborhood matrix embedding and we discard the terms $\ln\ln n$ and $\log^{6}(1/\epsilon)$ from the update time complexities (due to having terms such as $\ln n$ and $\epsilon^{-2}$ ). Under these assumptions, update time complexities of the first and second algorithms become $O\left(\frac{\ln n}{\epsilon}\right)$ and $O\left(\epsilon^{-2}\right)$ , respectively. Hence, if $\ln n\geq\epsilon^{-1}$ , the second algorithm finds a smaller update time, otherwise the first algorithm outperforms the second algorithm in terms of update time.

Note that in the general form and without relaying on any specific matrix embedding, our first algorithm updates the $1\pm\epsilon$ approximation in a sublinear time in terms of $n$ (Theorem 3 of Section 5.1). However, when we use CountSketch, the update time becomes independent of $n$ (Theorems 5, 6 and 7 of Section 6.1). In particular, if we consider $d$ and $\epsilon$ as constants, while update time of our first algorithm is a sublinear of $n$ (it still depends on $n$ ), our second algorithm updates the $1\pm\epsilon$ approximation in a constant time. As a result and in addition to the useful sparsity property of CountSketch [17], its another interesting property discussed in this paper is its constant update time for all the update operations node insertion, node deletion, edge insertion and edge deletion.

•

Similar to the case of update times, we may simplify pre-processing times by assuming that the algorithms use the $d$ -nearest neighborhood matrix embedding. We discard the terms $\log\log n$ and $\log^{7}(1/\epsilon)$ from Equations 13 and 15. Then, the pre-processing time complexities of the first and second algorithms become $O\left(n+\frac{\ln n}{\epsilon}\right)$ and $O\left(n+\epsilon^{-2}\right)$ , respectively. Therefore if $\epsilon^{-1}>\sqrt{n}$ , the first algorithm finds a smaller pre-processing time than the second algorithm.

8 Conclusion

In this paper, we presented sublinear update time randomized algorithms for dynamic graph regression. For a $n\times d$ efficiently updatable matrix embedding $\mathbold M$ where $d\ll n$ , our first algorithm is based on subsampled randomized Hadamard transform and supports edge insertion and edge deletion. It updates a $1\pm\epsilon$ approximation of the optimal solution in $O(rd)$ time, where $r$ is a sublinear of $n$ . Our second algorithm is based on CountSketch and supports edge insertion, edge deletion, node insertion and node deletion. It updates a $1\pm\epsilon$ approximation of the optimal solution in $O(qd)$ time, where $q=O\left(\frac{d^{2}}{\epsilon^{2}}\log^{6}(d/\epsilon)\right)$ .

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Nir Ailon and Edo Liberty. Fast dimension reduction using rademacher series on dual BCH codes. In Shang-Hua Teng, editor, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008 , pages 1–9. SIAM, 2008.
2[2] Kovac Arne and Andrew D A C Smith. Nonparametric regression on a graph. Journal of Computational and Graphical Statistics , 20(2):432–447, 6 2011. Publisher: American Statistical Association.
3[3] Maria-Florina Balcan and Kilian Q. Weinberger, editors. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 , volume 48 of JMLR Workshop and Conference Proceedings . JMLR.org, 2016.
4[4] Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. , 7:2399–2434, 2006.
5[5] C. Boutsidis and A. Gittens. Improved matrix algorithms via the subsampled randomized hadamard transform. SIAM Journal on Matrix Analysis and Applications , 34(3):1301–1340, 2013.
6[6] Hong Yun Cai, Vincent W. Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans. Knowl. Data Eng. , 30(9):1616–1637, 2018.
7[7] Jr Carl D. Meyer. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics , 24(3):315–323, 1973.
8[8] Mostafa Haghir Chehreghani. On the theory of dynamic graph regression problem. Co RR , abs/1903.10699, 2019.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sublinear Update Time Randomized Algorithms for

Abstract

1 Introduction

2 Preliminaries

3 Related work

4 Sketching techniques

Theorem 1** **(Theorem 2 (and the remark afterwards) of

Theorem 2** **(Theorem 30 of

5 Dynamic graph regression using

Definition 1**.**

5.1 The update algorithm

Theorem 3**.**

Proof.

5.2 Addressing the limitations

Definition 2**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Theorem 4**.**

Proof.

Corollary 1**.**

Proof.

6 Dynamic graph regression using CountSketch

Definition 3**.**

6.1 The update algorithm

6.1.1 Edge insertion/deletion

Theorem 5**.**

Proof.

6.1.2 Node insertion

Theorem 6**.**

Proof.

6.1.3 Node deletion

Theorem 7**.**

Proof.

6.2 Existence of a CUE matrix embedding

Theorem 8**.**

Proof.

Corollary 2**.**

Proof.

7 Discussion

8 Conclusion

Theorem 1 (Theorem 2 (and the remark afterwards) of

Theorem 2 (Theorem 30 of

Definition 1.

Theorem 3.

Definition 2.

Lemma 1.

Lemma 2.

Lemma 3.

Theorem 4.

Corollary 1.

Definition 3.

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.

Corollary 2.