Using Colors and Sketches to Count Subgraphs in a Streaming Graph

Shirin Handjani; Douglas Jungreis; Mark Tiefenbruck

arXiv:2302.12210·cs.DS·February 24, 2023

Using Colors and Sketches to Count Subgraphs in a Streaming Graph

Shirin Handjani, Douglas Jungreis, Mark Tiefenbruck

PDF

Open Access

TL;DR

This paper improves algorithms for estimating subgraph counts in streaming graphs by reducing storage and update time through three modifications, especially for graphs with bounded degree and specific subgraph structures.

Contribution

The authors introduce three modifications to an existing algorithm, significantly reducing storage and update time for counting subgraphs in streaming graphs under certain conditions.

Findings

01

Update time per edge is reduced to O(1).

02

Storage is decreased by a factor related to graph parameters.

03

Applicable to graphs with no leaves and bounded degree.

Abstract

Suppose we wish to estimate $# H$ , the number of copies of some small graph $H$ in a large streaming graph $G$ . There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$ . Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O (m^{k} / (# H)^{2})$ , where $m$ is the number of edges in $G$ , and $k$ is the number of edges in $H$ . Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that $H$ has no leaves and that $G$ has maximum degree $\leq m^{1/2 - α}$ , where $α > 0$ . Define $C = min (m^{2 α}, m^{1/3})$ . Then in our version of the algorithm, the update time per edge is $O (1)$ , and the storage is approximately reduced by a factor of $C^{2 k - t - 2}$ , where…

Equations141

Z_{i}^{red, blue} = v red w blue \sum M_{i} (v w) .

Z_{i}^{red, blue} = v red w blue \sum M_{i} (v w) .

Z_{1}^{red, blue} Z_{2}^{blue, green} Z_{3}^{green, red}

Z_{1}^{red, blue} Z_{2}^{blue, green} Z_{3}^{green, red}

X_{i} (v) = j \in Γ (b), j \neq = i \prod X_{j} (v)^{- 1} .

X_{i} (v) = j \in Γ (b), j \neq = i \prod X_{j} (v)^{- 1} .

j \in Γ (b) \prod X_{j} (v) = I .

j \in Γ (b) \prod X_{j} (v) = I .

X_{i_{δ}} (v_{δ}) = X_{i_{1}} (v_{δ})^{- 1} \dots X_{i_{δ - 1}} (v_{δ})^{- 1},

X_{i_{δ}} (v_{δ}) = X_{i_{1}} (v_{δ})^{- 1} \dots X_{i_{δ - 1}} (v_{δ})^{- 1},

j = 1 \prod δ X_{i_{j}} (v_{j}) = j = 1 \prod δ - 1 X_{i_{j}} (v_{j}) X_{i_{j}} (v_{δ})^{- 1} .

j = 1 \prod δ X_{i_{j}} (v_{j}) = j = 1 \prod δ - 1 X_{i_{j}} (v_{j}) X_{i_{j}} (v_{δ})^{- 1} .

(X_{i_{1}} (v_{1}) X_{i_{1}} (v_{δ})^{- 1}), \dots, (X_{i_{δ - 1}} (v_{δ - 1}) X_{i_{δ - 1}} (v_{δ})^{- 1})

(X_{i_{1}} (v_{1}) X_{i_{1}} (v_{δ})^{- 1}), \dots, (X_{i_{δ - 1}} (v_{δ - 1}) X_{i_{δ - 1}} (v_{δ})^{- 1})

M_{i} (v w) = X_{2 i - 1} (v) X_{2 i} (w) .

M_{i} (v w) = X_{2 i - 1} (v) X_{2 i} (w) .

Q (T) = i = 1 \prod k M_{i} (v_{2 i - 1} v_{2 i}) = j = 1 \prod 2 k X_{j} (v_{j}),

Q (T) = i = 1 \prod k M_{i} (v_{2 i - 1} v_{2 i}) = j = 1 \prod 2 k X_{j} (v_{j}),

P_{b} (T) = j \in Γ (b) \prod X_{j} (v_{j}) .

P_{b} (T) = j \in Γ (b) \prod X_{j} (v_{j}) .

Q (T) = b \in V (H) \prod P_{b} (T) .

Q (T) = b \in V (H) \prod P_{b} (T) .

Q (T) = b \in V (H) \prod P_{b} (T) = I .

Q (T) = b \in V (H) \prod P_{b} (T) = I .

Z_{i}^{c_{1}, c_{2}} = v w \in E (G) : C (v) = c_{1}, C (w) = c_{2} \sum M_{i} (v w) .

Z_{i}^{c_{1}, c_{2}} = v w \in E (G) : C (v) = c_{1}, C (w) = c_{2} \sum M_{i} (v w) .

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S_{(c_{1}, \dots, c_{t})}

S_{(c_{1}, \dots, c_{t})}

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

E (\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )} \cdot \frac{tr ( S )}{d \cdot auto ( H )}) = # H,

E (\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )} \cdot \frac{tr ( S )}{d \cdot auto ( H )}) = # H,

\frac{C ( C - 1 ) \dots ( C - t + 1 )}{C ^{t}},

\frac{C ( C - 1 ) \dots ( C - t + 1 )}{C ^{t}},

\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 ) \cdot d \cdot auto ( H )}

\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 ) \cdot d \cdot auto ( H )}

Z_{i}^{C (v), C (w)}

Z_{i}^{C (v), C (w)}

Z_{i}^{C (v), C (w)}

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )}) (\frac{tr ( S )}{d \cdot auto ( H )}) .

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )}) (\frac{tr ( S )}{d \cdot auto ( H )}) .

Z_{i}^{c_{1}, c_{2}} = j = 0 \sum d - 1 Count_{c_{1}, c_{2}} (i, j) M^{j} .

Z_{i}^{c_{1}, c_{2}} = j = 0 \sum d - 1 Count_{c_{1}, c_{2}} (i, j) M^{j} .

Z_{i}^{c_{1}, c_{2}} = j = 0 \sum d - 1 Count_{c_{1}, c_{2}} (i, j) M^{j} .

Z_{i}^{c_{1}, c_{2}} = j = 0 \sum d - 1 Count_{c_{1}, c_{2}} (i, j) M^{j} .

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S_{(c_{1}, \dots, c_{t})} = i = 1 \prod k Z_{i}^{c_{a_{2 i - 1}}, c_{a_{2 i}}} .

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

S = (c_{1}, \dots, c_{t}) distinct \sum S_{(c_{1}, \dots, c_{t})} .

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )}) (\frac{tr ( S )}{d \cdot auto ( H )}) .

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 )}) (\frac{tr ( S )}{d \cdot auto ( H )}) .

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 ) \cdot d \cdot auto ( H )})^{2} E (tr (S) tr (\overline{S})) - (# H)^{2},

(\frac{C ^{t}}{C ( C - 1 ) \dots ( C - t + 1 ) \cdot d \cdot auto ( H )})^{2} E (tr (S) tr (\overline{S})) - (# H)^{2},

S_{(c_{1}, \dots, c_{t})} = T = (v_{1} v_{2}, \dots, v_{2 k - 1} v_{2 k}) is (c_{1}, \dots, c_{t}) -compatible \sum Q (T),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLimits and Structures in Graph Theory · Complexity and Algorithms in Graphs · Advanced Graph Theory Research

Full text

Using Colors and Sketches to Count Subgraphs in a Streaming Graph

Shirin Handjani [email protected] IDA Center for Communications Research, La Jolla

Douglas Jungreis [email protected] IDA Center for Communications Research, La Jolla

Mark Tiefenbruck [email protected] IDA Center for Communications Research, La Jolla

Abstract

Suppose we wish to estimate $\#H$ , the number of copies of some small graph $H$ in a large streaming graph $G$ . There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$ . Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O(m^{k}/(\#H)^{2})$ , where $m$ is the number of edges in $G$ , and $k$ is the number of edges in $H$ . Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that $H$ has no leaves and that $G$ has maximum degree $\leq m^{1/2-\alpha}$ , where $\alpha>0$ . Define $C=\min(m^{2\alpha},m^{1/3})$ . Then in our version of the algorithm, the update time per edge is $O(1)$ , and the storage is approximately reduced by a factor of $C^{2k-t-2}$ , where $t$ is the number of vertices in $H$ ; in particular, the storage is $O(C^{2}+m^{k}/(C^{2k-t-2}(\#H)^{2}))$ .

1 Introduction

Suppose that a large simple graph $G$ is presented as a stream of edge insertions and deletions, and suppose that $H$ is a very small graph (e.g., a small clique or cycle). Our goal is to estimate $\#H$ , the number of copies of $H$ that appear in $G$ , where we are permitted a single pass through the stream. This problem has received a great deal of attention, particularly in the case where $H$ is a triangle; however, there are only a few known techniques that apply to arbitrary $H$ . Here we focus on the technique that was developed in [22, 27], which we refer to as the [KMSS]-algorithm.

The [KMSS]-algorithm, which uses complex-valued linear sketches, has many strengths: it applies to arbitrary $H$ ; it can be used in distributed settings; it allows edge deletions; and it is extremely efficient in a variety of situations, such as when $H$ is a star graph. However, there are many situations where the algorithm is not practical. Suppose $G$ has $m$ edges, and suppose $H$ has $k$ edges and $t$ vertices. When the [KMSS]-algorithm produces a single estimate of $\#H$ , that estimate has variance $\Theta(m^{k})$ , so it is necessary to produce $O(m^{k}/(\#H)^{2})$ estimates and average them. The storage and update time per edge are proportional to the number of estimates produced, and are therefore both $O(m^{k}/(\#H)^{2})$ .

In this paper, we describe three modifications to the [KMSS]-algorithm that greatly reduce both the storage and update time per edge. Suppose that $H$ is a connected graph with no leaves. Suppose also that the maximum degree of any vertex in $G$ is $\Delta\leq m^{1/2-\alpha}$ , where $\alpha>0$ , and define $C=\min(m^{1/3},m^{2\alpha})$ . Then the storage required by our algorithm is $O(C^{2}+m^{k}/(C^{2k-t-2}(\#H)^{2}))$ , i.e., it has been reduced approximately by a factor of $C^{2k-t-2}$ . The update time per edge is $O(1)$ .

The problem of counting copies of a small graph $H$ in a large graph $G$ has been studied extensively. It has many applications, as diverse as community detection, information retrieval, and motifs in bioinformatics; see for instance [5, 13, 15, 26, 32]. Here we restrict to the case where $G$ is given as a data stream, and our goal is merely to estimate $\#H$ , as opposed to computing $\#H$ exactly. Most work on this problem has addressed the case where $H$ is a triangle [4, 7, 9, 10, 11, 14, 16, 17, 18, 19, 20, 23, 24, 25, 28, 29, 30]. A few authors have addressed other specific subgraphs, such as butterflies [31] and cycles [27]. We are only aware of a few algorithms that apply to arbitrary subgraphs [6, 8, 22, 21]. Two of these, [8] and [6], require multiple passes through the stream, which we do not allow here. The third, [22], presents the [KMSS]-algorithm, which is the focus of this paper. The last, [21], presents a vertex-sampling algorithm which, in some situations, is extremely efficient, requiring storage $O(m/(\#H)^{1/\tau})$ , where $\tau$ is the fractional vertex cover number of $H$ . However, this bound requires a strong assumption on $G$ : it either requires that $G$ have bounded degree, or it requires that the maximum degree in $G$ is $(\#H)^{1/(2\tau)}$ and that some optimal fractional vertex cover of $H$ can place non-zero degree on every vertex.

In order to explain our contribution to this problem, we first need to briefly review the [KMSS]-algorithm. Consider a fixed $H$ . Many independent estimates are made for $\#H$ , and they are then averaged. To get a single estimate, the first step is to arbitrarily assign directions to the edges of $H$ . We refer to the resulting digraph as $\vec{H}$ and its edges as $\overrightarrow{a_{1}a_{2}},\overrightarrow{a_{3}a_{4}},\dots,\overrightarrow{a_{2k-1}a_{2k}}$ . Also each edge $vw$ of $G$ is replaced by two directed edges $\overrightarrow{vw}$ and $\overrightarrow{wv}$ . We refer to the resulting directed version of $G$ as $\vec{G}$ . For any graph or digraph $X$ , we refer to its vertices and edges as ${\cal{V}}(X)$ and ${\cal{E}}(X)$ . Now we define $k$ functions ${\cal{M}}_{i}\colon{\cal{E}}(\vec{G})\rightarrow{\bf C}$ , one for each edge $\overrightarrow{a_{2i-1}a_{2i}}\in{\cal{E}}(\vec{H})$ ; each ${\cal{M}}_{i}$ maps edges of ${\cal{E}}(\vec{G})$ to complex roots of unity. These functions are defined in such a way that they can “recognize” whether a $k$ -tuple of edges $\vec{T}=(\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}})$ in $\vec{G}$ forms a copy of $\vec{H}$ with each $\overrightarrow{a_{2i-1}a_{2i}}$ mapping to $\overrightarrow{v_{2i-1}v_{2i}}$ . In particular, if $\vec{T}$ does form such a copy, then the expected value (over all permissible choices of the maps ${\cal{M}}_{i}$ ) of $\prod_{i=1}^{k}{\cal{M}}_{i}(\overrightarrow{v_{2i-1}v_{2i}})$ is a non-zero constant; otherwise, the expected value is zero. Then, as the edges stream by, the $k$ values ${\cal{Z}}_{i}=\sum_{\overrightarrow{vw}\in{\cal{E}}(\vec{G})}{\cal{M}}_{i}(\overrightarrow{vw})$ are computed. Finally, when the stream ends, the estimate of $\#H$ is given by $\prod_{i=1}^{k}{\cal{Z}}_{i}$ multiplied by an appropriate constant.

The key to the algorithm is how to define the functions ${\cal{M}}_{i}$ so that they can recognize when $\vec{T}$ forms a copy of $\vec{H}$ . Each of these functions ${\cal{M}}_{i}$ has two parts: one part is meant to recognize when $\vec{T}$ forms a homomorphic image of $\vec{H}$ , and the other part is meant to recognize when the $t$ vertices of this homomorphic image are distinct. In this paper, we do not use the second part; we use a different method to ensure that the $t$ vertices are distinct. We therefore omit the second part from our description, keeping in mind that this description differs somewhat from the one in [22]. For each vertex $b\in H$ , we define a hash function ${\cal{X}}_{b}\colon{\cal{V}}(G)\rightarrow{\bf C}$ , which maps vertices of $G$ to complex $\deg(b)^{\rm th}$ roots of unity, where $\deg(b)$ is the degree of $b$ in $H$ . Then ${\cal{M}}_{i}(\overrightarrow{vw})$ is defined to be ${\cal{X}}_{a_{2i-1}}(v){\cal{X}}_{a_{2i}}(w)$ . It is not difficult to see that $\prod_{i=1}^{k}{\cal{M}}_{i}(\overrightarrow{v_{2i-1}v_{2i}})$ has expected value 1 if $\vec{T}$ forms a homomorphic image of $\vec{H}$ with each $\overrightarrow{a_{2i-1}a_{2i}}$ mapping to $\overrightarrow{v_{2i-1}v_{2i}}$ ; otherwise, it has expected value 0.

We can now describe our contributions to this problem. We present three modifications to the [KMSS]-algorithm, which can be used separately or together to reduce the storage and update time per edge. First, we introduce a different method for ensuring that we count only those homomorphic images of $\vec{H}$ that have $t$ distinct vertices. We do this by assigning colors to the vertices of $G$ . Assuming there are $C$ colors, we subdivide each sum ${\cal{Z}}_{i}$ into $C^{2}$ different sums, one for each pair of colors. For instance, there might be a red-blue sum

[TABLE]

There might also be analogous blue-green sums and green-red sums, and if we were counting triangles, then

[TABLE]

would give an estimate for the number of triangles whose three vertices were respectively red, blue, and green. This allows us to count only homomorphic images whose vertices all have different colors, which in turn ensures that the vertices are all distinct. However, making sure the vertices are distinct is not the primary reason we use colors. The primary reason is that it dramatically reduces the variance.

For our second modification, rather than defining one hash function ${\cal{X}}$ for each vertex of $H$ , we define one for each half-edge of $H$ , with the condition that for any vertex $v$ of $G$ and $b$ of $H$ , the product $\prod_{h}{\cal{X}}_{h}(v)=1$ , where the product is taken over all half-edges $h$ in $H$ that are incident to $b$ . This too reduces the variance of each estimate.

For the third modification, rather than using hash functions ${\cal{X}}$ that map vertices to roots of unity, we use hash functions that map vertices to diagonal $d$ -by- $d$ matrices. Each position along the diagonal of the matrix more-or-less gives a separate estimate of $\#H$ , so in some sense, this is almost equivalent to making $d$ independent estimates. The difference is that, when an edge streams by, instead of updating each ${\cal{Z}}_{i}$ for $d$ different estimates, we only have to update each ${\cal{Z}}_{i}$ for one matrix of estimates. This lets us reduce the update time per edge approximately by a factor of $d$ .

This paper is organized as follows. In Section 2, we describe our modified version of the [KMSS]-algorithm and prove that it gives an unbiased estimate of $\#H$ . In Section 3, we bound the variance of our estimate. In Section 4, we compare the storage and update time of our algorithm to that of the original algorithm.

The authors would like to thank Kyle Hofmann, Anthony Gamst, and Eric Price for many helpful conversations.

2 Description of Algorithm

In this section, we describe our algorithm and show that it gives an unbiased estimate of $\#H$ . We only explain how to use the algorithm to produce a single estimate of $\#H$ , but in order to get a more accurate estimate of $\#H$ , we would compute many such estimates and take their average.

Fix some small graph $H$ . We assume throughout the paper that $H$ is connected and has no leaves. Let $t$ and $k$ respectively denote the number of vertices and edges in $H$ . Arbitrarily assign directions to the edges of $H$ , and call the resulting directed graph $\vec{H}$ . We assume that the $t$ vertices of $H$ are labeled $1,\dots,t$ , and the $k$ edges are $\overrightarrow{a_{1}a_{2}},\dots,\overrightarrow{a_{2k-1}a_{2k}}$ , where each $a_{i}\in\{1,\dots,t\}$ . $\vec{H}$ has $2k$ half-edges, which we call $h_{1},\dots,h_{2k}$ , where $h_{2i-1}$ and $h_{2i}$ are respectively the two halves of $\overrightarrow{a_{2i-1}a_{2i}}$ . In particular, each $h_{j}$ is incident to $a_{j}$ . For $b\in{\cal{V}}(\vec{H})$ define $\Gamma(b)=\{i:a_{i}=b\}$ . In other words, $\Gamma(b)$ tells which half-edges are incident to $b$ . Figure 1 illustrates an example where $t=4$ and $k=5$ .

For each vertex $b\in{\cal{V}}(\vec{H})$ , select an arbitrary element $i\in\Gamma(b)$ , and call $h_{i}$ the distinguished half-edge at $b$ . Observe that there are $2k$ half-edges in $\vec{H}$ , of which $t$ are distinguished and $2k-t$ are not.

2.1 The Functions ${\cal{X}}_{i}$

The [KMSS]-algorithm uses hash functions ${\cal{X}}$ that map vertices of $G$ to complex roots of unity. Here we define similar functions, but there are two differences. First, instead of having one function ${\cal{X}}$ for each vertex of $H$ , we have one for each half-edge of $H$ . Second, we allow the more general setting where the co-domain of each ${\cal{X}}$ is a group of diagonal matrices.

Let ${\cal{G}}$ be any finite group of diagonal matrices with the property that the average of the elements of ${\cal{G}}$ (i.e., $\sum_{g\in{\cal{G}}}g/|{\cal{G}}|$ ) is the zero matrix. Note that since ${\cal{G}}$ consists of diagonal matrices, it is abelian. We use $d$ to denote the dimension of the matrices in ${\cal{G}}$ . We are primarily interested in two types of groups ${\cal{G}}$ . In the first type, $d=1$ , and the elements of ${\cal{G}}$ are the complex $r^{\rm th}$ roots of unity, for some $r\geq 2$ . In that case, the matrices can be viewed as complex numbers and are therefore equivalent to what’s used in the [KMSS]-algorithm. For the second type of ${\cal{G}}$ , $d\geq 2$ . Let $\omega=e^{2\pi i/d}$ , and let $M$ be the square diagonal matrix that has $1,\omega,\omega^{2},\dots,\omega^{d-1}$ along the diagonal. Then ${\cal{G}}$ is the group generated by $M$ and $-I$ (where $I$ is the $d$ -dimensional identity matrix); thus ${\cal{G}}$ has $2d$ elements: $\pm I,\pm M,\pm M^{2},\dots,\pm M^{d-1}$ . In this paper, we focus on those two types of ${\cal{G}}$ , but we remark that there are other ${\cal{G}}$ that satisfy the given conditions; e.g., diagonal matrices whose diagonal entries are all $\pm 1$ . The entire discussion in this section applies to any such ${\cal{G}}$ ; in particular, our algorithm gives an unbiased estimate of $\#H$ for any such ${\cal{G}}$ . However, the discussion of the variance in the next section applies only to these two specific choices of ${\cal{G}}$ .

Fix any such group ${\cal{G}}$ , and for each $1\leq i\leq 2k$ , define a hash function ${\cal{X}}_{i}\colon{\cal{V}}(G)\rightarrow{\cal{G}}$ . If $h_{i}$ is a non-distinguished half-edge of $\vec{H}$ , then for each $v\in{\cal{V}}(G)$ , the value ${\cal{X}}_{i}(v)$ is a random element of ${\cal{G}}$ , and the functions ${\cal{X}}_{i}$ for non-distinguished $h_{i}$ are chosen independently and uniformly from a family of $4k$ -wise independent hash functions. If $h_{i}$ is the distinguished half-edge at $b$ , then ${\cal{X}}_{i}(v)$ is defined by

[TABLE]

If $i$ is the only element of $\Gamma(b)$ , then ${\cal{X}}_{i}(v)=I$ . Observe that this definition of ${\cal{X}}_{i}$ ensures that for any vertex $b$ of $H$ and any $v\in{\cal{V}}(G)$ ,

[TABLE]

Lemma 1.

Let $b\in\{1,\dots,t\}$ be any vertex of $\vec{H}$ , and suppose its degree is $\delta$ . Suppose $\Gamma(b)=\{i_{1},\dots,i_{\delta}\}$ ; i.e., $h_{i_{1}},\dots,h_{i_{\delta}}$ are the half-edges of $\vec{H}$ incident to $b$ . Let $v_{1},\dots,v_{\delta}$ be any $\delta$ not-necessarily-distinct vertices of $G$ . Then ${\cal{X}}_{i_{1}}(v_{1})\cdots{\cal{X}}_{i_{\delta}}(v_{\delta})$ is equal to $I$ if $v_{1}=\dots=v_{\delta}$ , and otherwise it is a uniformly random element of ${\cal{G}}$ .

**Proof: **If $\delta=1$ , then the result is clearly true, so assume $\delta>1$ . Assume without loss of generality that the distinguished half-edge at $b$ is $h_{i_{\delta}}$ . Then by definition,

[TABLE]

so

[TABLE]

If $v_{1}=\dots=v_{\delta}$ , then (1) is equal to $I$ . Now assume that some $v_{j}\neq v_{\delta}$ . Then for that $j$ , ${\cal{X}}_{i_{j}}(v_{j}){\cal{X}}_{i_{j}}(v_{\delta})^{-1}$ is the quotient of two independent uniformly random elements of ${\cal{G}}$ , and is thus a uniformly random element of ${\cal{G}}$ . Also, none of $h_{i_{1}},\dots,h_{i_{{\delta}-1}}$ are distinguished, so

[TABLE]

are independent for all $v_{j}\neq v_{\delta}$ , and the rest are $I$ . Since at least one is uniformly random, their product is as well.

2.2 The Functions ${\cal{M}}_{i}$

Let $\vec{G}$ be the directed graph obtained by replacing each edge $vw$ of $G$ by two directed edges, $\overrightarrow{vw}$ and $\overrightarrow{wv}$ . Each time an edge of $G$ streams by, treat it as two directed edges of $\vec{G}$ . From now on, we use $m$ to refer to the number of edges in $\vec{G}$ . Arguably, we should use $2m$ ; however, $m$ will be more convenient, and the factor of 2 will be irrelevant to all of our conclusions, which use $O()$ notation.

For each edge $\overrightarrow{a_{2i-1}a_{2i}}$ of $\vec{H}$ , define a function ${\cal{M}}_{i}\colon{\cal{E}}(\vec{G})\rightarrow{\cal{G}}$ by

[TABLE]

For any $k$ -tuple $\vec{T}=(\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}})$ of (not necessarily distinct) edges in ${\cal{E}}(\vec{G})$ , define

[TABLE]

and for each vertex $b\in{\cal{V}}(\vec{H})$ , define

[TABLE]

Since every half-edge of $\vec{H}$ is in exactly one of the sets $\Gamma(b),$ we have

[TABLE]

The function ${\cal{Q}}$ will in a sense “test” whether $\vec{T}$ forms a copy of $\vec{H}$ .

Lemma 2.

Let $\vec{T}=(\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}})$ be any $k$ -tuple of edges of $\vec{G}$ . Suppose $f\colon{\cal{E}}(\vec{H})\rightarrow{\cal{E}}(\vec{G})$ sends $\overrightarrow{a_{2i-1}a_{2i}}$ to $\overrightarrow{v_{2i-1}v_{2i}}$ for each $i$ . If $f$ induces a homomorphism from $\vec{H}$ to $\vec{G}$ , then ${\cal{Q}}(\vec{T})=I$ . If $f$ does not induce such a homomorphism, then ${\cal{Q}}(\vec{T})$ is a uniformly random element of ${\cal{G}}$ .

**Proof: **Suppose $f$ induces a homomorphism from $\vec{H}$ to $\vec{G}$ . Let $b\in\{1,\dots,t\}$ be any vertex of $H$ , and suppose the homomorphism sends $b$ to $w$ . Suppose $\Gamma(b)=\{j_{1},\dots,j_{d}\}$ ; i.e., $h_{j_{1}},\dots,h_{j_{d}}$ are the half-edges of $\vec{H}$ that are incident to $b$ . Then $v_{j_{1}},\dots,v_{j_{d}}$ must all be equal to $w$ . By Lemma 1, ${\cal{X}}_{j_{1}}(v_{j_{1}})\cdots{\cal{X}}_{j_{d}}(v_{j_{d}})=~{}I$ . Equivalently, ${\cal{P}}_{b}(\vec{T})=I$ . This is true for every $b\in{\cal{V}}(\vec{H})$ , so

[TABLE]

Now suppose $f$ does not induce such a homomorphism. Then there must be some vertex $b$ of $H$ such that, if $\Gamma(b)=\{j_{1},\dots,j_{d}\}$ , then the vertices $v_{j_{1}},\dots,v_{j_{d}}$ are not all equal. Thus by Lemma 1, ${\cal{X}}_{j_{1}}(v_{j_{1}})\cdots{\cal{X}}_{j_{d}}(v_{j_{d}})$ is a uniformly random element of ${\cal{G}}$ , i.e., ${\cal{P}}_{b}(\vec{T})$ is a uniformly random element. ${\cal{P}}_{b}(\vec{T})$ is independent of ${\cal{P}}_{c}(\vec{T})$ for any other $c\in{\cal{V}}(\vec{H})$ , so $\prod_{c\in{\cal{V}}(\vec{H})}{\cal{P}}_{c}(\vec{T})$ is also a uniformly random element; i.e., ${\cal{Q}}(\vec{T})$ is a uniformly random element.

2.3 Coloring Vertices

Fix some number of colors $C\geq t$ . For the purposes of bounding the variance, we will later assume that the maximum degree of any vertex of $G$ is $\leq m^{1/2-\alpha}$ and then set $C=\min(m^{1/3},m^{2\alpha})$ ; however, here $C$ may take any value $\geq t$ . Define a hash function ${\cal{C}}\colon{\cal{V}}(G)\rightarrow\{1,\dots,C\}$ that assigns a color to each vertex of $G$ . For each vertex $v$ , ${\cal{C}}(v)$ is a uniformly random color, and ${\cal{C}}$ is chosen uniformly at random from a family of $4k$ -wise independent hash functions.

Consider functions $f\colon{\cal{E}}(\vec{H})\rightarrow{\cal{E}}(\vec{G})$ . There are $m^{k}$ such functions, but we want to find only the ones that map $\vec{H}$ isomorphically onto its image. Suppose that $f$ maps the edges $\overrightarrow{a_{1}a_{2}},\dots,\overrightarrow{a_{2k-1}a_{2k}}$ to the edges $\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}}$ respectively. Then for any vertex $b\in{\cal{V}}(\vec{H})$ , all of the vertices $\{a_{i}:i\in\Gamma(b)\}$ are equal to $b$ ; i.e., they’re all the same vertex. Therefore, a necessary condition for $f$ to induce an isomorphism is that all the vertices $\{v_{i}:i\in\Gamma(b)\}$ are the same vertex. In particular, a necessary condition is that all the vertices $\{v_{i}:i\in\Gamma(b)\}$ have the same color. Thus we say that either the map $f$ or the $k$ -tuple of edges $\vec{T}=(\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}})$ is color-compatible if for every $b\in{\cal{V}}(\vec{H})$ , all the vertices $\{v_{i}:i\in\Gamma(b)\}$ have the same color. More specifically, for any ordered $t$ -tuple of colors $(c_{1},\dots,c_{t})$ , we say that $\vec{T}$ is $(c_{1},\dots,c_{t})$ -compatible if for every $b\in{\cal{V}}(\vec{H})$ , all the vertices $\{v_{i}:i\in\Gamma(b)\}$ have color $c_{b}$ , or equivalently, if ${\cal{C}}(v_{i})=c_{a_{i}}$ for every $1\leq i\leq 2k$ . Thus $\vec{T}$ is color-compatible if there exists a $t$ -tuple $(c_{1},\dots,c_{t})$ such that $\vec{T}$ is $(c_{1},\dots,c_{t})$ -compatible. Furthermore, if $\vec{T}$ is $(c_{1},\dots,c_{t})$ -compatible and the $t$ colors $c_{1},\dots,c_{t}$ are distinct, then we will say that $\vec{T}$ is distinctly color-compatible.

As we saw in Lemma 2, ${\cal{Q}}(\vec{T})$ is equal to $I$ if $\vec{T}$ forms a homomorphic image of $\vec{H}$ , and otherwise is a uniformly random element of ${\cal{G}}$ . The strategy in [22] is basically to compute the sum of ${\cal{Q}}(\vec{T})$ over all $\vec{T}$ . The sum then has $m^{k}$ terms and therefore tends to have high variance. Here, rather than summing over all $\vec{T}$ , we will only sum over distinctly color-compatible $\vec{T}$ . The resulting sum will then have far fewer terms and therefore tend to have far lower variance.

For colors $c_{1},c_{2}\in\{1,\dots,C\}$ and $1\leq i\leq k$ , define

[TABLE]

Thus there are $C^{2}k$ such sums, and ${\cal{Z}}^{c_{1},c_{2}}_{i}$ is the sum of ${\cal{M}}_{i}(\overrightarrow{vw})$ over all edges $\overrightarrow{vw}$ for which the color of $v$ is $c_{1}$ and the color of $w$ is $c_{2}$ . Also, define

[TABLE]

We use $E(\,)$ to denote expected value (not to be confused with ${\cal{E}}(\,)$ , which refers to the edge-set). We use ${\rm tr}(\,)$ to denote the trace of a matrix.

Lemma 3.

For $c_{1},\dots,c_{t}$ distinct, $E({\rm tr}({\cal{S}}_{(c_{1},\dots,c_{t})})/d)$ is equal to the number of $(c_{1},\dots,c_{t})$ -compatible maps $f\colon{\cal{E}}(\vec{H})\rightarrow{\cal{E}}(\vec{G})$ that induce injective homomorphisms from $\vec{H}$ to $\vec{G}$ .

**Proof: **From the definitions of ${\cal{S}}_{(c_{1},\dots,c_{t})}$ and ${\cal{Z}}^{c_{1},c_{2}}_{i}$ , we have

[TABLE]

In that last sum, there is one term for every $(c_{1},\dots,c_{t})$ -compatible map $f\colon{\cal{E}}(\vec{H})\rightarrow{\cal{E}}(\vec{G})$ . Consider any one such term. By Lemma 2, if $f$ does not induce a homomorphism from $\vec{H}$ to $\vec{G}$ , then that term is a uniformly random element of ${\cal{G}}$ , and, by our assumption on ${\cal{G}}$ , its trace therefore has expected value 0. Thus those terms do not contribute to $E({\rm tr}({\cal{S}}_{(c_{1},\dots,c_{t})}))$ . If $f$ does induce such a homomorphism, then by Lemma 2, that term is equal to $I$ , so it contributes $d$ to the trace of ${\cal{S}}_{(c_{1},\dots,c_{t})}$ . Thus $E({\rm tr}({\cal{S}}_{(c_{1},\dots,c_{t})})/d)$ is equal to the number of $(c_{1},\dots,c_{t})$ -compatible maps $f$ that induce homomorphisms from $\vec{H}$ to $\vec{G}$ . Since the colors $c_{1},\dots,c_{t}$ were assumed to be distinct, any such homomorphism sends the vertices of $\vec{H}$ to vertices of $\vec{G}$ with different colors and is therefore injective.

Define

[TABLE]

Theorem 1.

[TABLE]

where ${\rm auto}(H)$ is the number of automorphisms of $H.$

**Proof: **By Lemma 3, if $c_{1},\dots,c_{t}$ are distinct colors, then ${\rm tr}({\cal{S}}_{(c_{1},\dots,c_{t})})/d$ gives an unbiased estimate of the number of $(c_{1},\dots,c_{t})$ -compatible maps $f\colon{\cal{E}}(\vec{H})\rightarrow{\cal{E}}(\vec{G})$ that induce injective homomorphisms from $\vec{H}$ to $\vec{G}$ , i.e., the number of injective homomorphic images of $\vec{H}$ in $\vec{G}$ whose vertices have colors $c_{1},\dots,c_{t}$ respectively. Summing over distinct $c_{1},\dots,c_{t}$ , we see that ${\rm tr}({\cal{S}})/d$ gives an unbiased estimate of the number of injective homomorphic images whose vertices have distinct colors. The probability that a randomly colored injective homomorphic image of $\vec{H}$ has distinct colors is

[TABLE]

so we divide by this expression. Finally, each copy of $H$ gets counted as ${\rm auto}(H)$ different injective homomorphic images, so we divide by ${\rm auto}(H)$ .

Theorem 1 provides the method for counting copies of $H$ . As the edges stream by, we compute the sums ${\cal{Z}}^{c_{1},c_{2}}_{i}$ . In particular, if the edge $\overrightarrow{vw}$ streams by, then for each $1\leq i\leq k$ , we compute ${\cal{M}}_{i}(\overrightarrow{vw})$ and add it to the sum ${\cal{Z}}_{i}^{{\cal{C}}(v),{\cal{C}}(w)}$ . (For an edge-deletion, we subtract ${\cal{M}}_{i}(\overrightarrow{vw})$ from ${\cal{Z}}_{i}^{{\cal{C}}(v),{\cal{C}}(w)}$ .) Once the data-stream has ended, for every $t$ -tuple of distinct colors $(c_{1},\dots,c_{t})$ , we compute the product ${\cal{S}}_{(c_{1},\dots,c_{t})}$ using Equation (3). Finally, we sum these values to get $S$ , take the trace, and multiply by

[TABLE]

to get the final estimate. We refer to this as Algorithm 1 and summarize the steps in Table 1. Observe that after the data-stream ends, we do a potentially large computation, which could involve computing roughly $C^{t}$ values ${\cal{S}}_{(c_{1},\dots,c_{t})}$ . There are often, but not always, ways to do this computation with less than $C^{t}$ work. This is discussed further in Section 4.

In the case where ${\cal{G}}=\{\pm I,\pm M,\pm M^{2},\dots,\pm M^{d-1}\}$ with $d>1$ , a very slight modification to Algorithm 1 reduces the update time per edge by roughly a factor of $d$ . In this modified algorithm, which we call Algorithm 2, we do not compute the sums ${\cal{Z}}_{i}^{c_{1},c_{2}}$ until after the data stream has ended. Instead, we keep counts of how many times each $M^{j}$ would have contributed to ${\cal{Z}}_{i}^{c_{1},c_{2}}$ . Thus we have a count for each $i,j,c_{1},c_{2}$ , which we call ${\rm Count}_{c_{1},c_{2}}(i,j)$ . Suppose that when some edge $\overrightarrow{vw}$ streams by, we compute ${\cal{M}}_{i}(\overrightarrow{vw})$ and find that it is equal to $M^{j}$ . Rather than immediately adding $M^{j}$ to ${\cal{Z}}_{i}^{{\cal{C}}(v),{\cal{C}}(w)}$ , we add 1 to ${\rm Count}_{{\cal{C}}(v),{\cal{C}}(w)}(i,j)$ . (If $\overrightarrow{vw}$ is an edge-deletion or if ${\cal{M}}_{i}(\overrightarrow{vw})$ is equal to $-M^{j}$ , then we instead subtract 1 from the count.) Thus, rather than updating $d$ diagonal entries, we update one count, saving a factor of $d$ in update time. The storage does not change much: for each ${\cal{Z}}_{i}^{c_{1},c_{2}}$ , rather than storing the values of $d$ diagonal entries, we store $d$ counts. After the data stream ends, we compute each

[TABLE]

Note that Equation (5) can be evaluated using a fast Fourier transform, though this is unlikely to have much effect on the overall run time. The steps of Algorithm 2 are summarized in Table 2.

3 The Variance

In this section, we bound the variance of the estimate given by our algorithm. Note that the variance is the same whether we use Algorithm 1 or Algorithm 2, since they produce the same estimate, so we do not distinguish between the two. The variance does however depend on the choice of ${\cal{G}}$ , and our proof only applies when ${\cal{G}}$ is either the group of $r^{\rm th}$ roots of unity or the group $\{\pm I,\pm M,\pm M^{2},\dots,\pm M^{d-1}\}$ . In either case, the variance is a large sum, but most terms in the sum are zero. In Section 3.1, we give conditions that classify which terms contribute non-trivially to the sum when ${\cal{G}}$ is the group of $r^{\rm th}$ roots of unity. In Section 3.2, we do the same when ${\cal{G}}$ is the group $\{\pm I,\dots,\pm M^{d-1}\}$ . In Section 3.3, we bound the number of terms that satisfy those conditions, obtaining our bound.

Our estimate of $\#H$ (which is given in Theorem 1) has variance

[TABLE]

where $\overline{{\cal{S}}}$ denotes the complex conjugate of ${\cal{S}}$ . We thus wish to understand the term $E\left({\rm tr}({\cal{S}}){\rm tr}(\overline{{\cal{S}}})\right)$ .

From Equation (4),

[TABLE]

so

[TABLE]

Thus ${\rm tr}({\cal{S}}){\rm tr}(\overline{{\cal{S}}})$ is a sum of terms of the form

[TABLE]

In particular, there is one term for every $2k$ -tuple of edges $(\vec{T_{1}},\vec{T_{2}})$ for which $\vec{T_{1}}=\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}}$ is distinctly color-compatible and $\vec{T_{2}}=\overrightarrow{w_{1}w_{2}},\dots,\overrightarrow{w_{2k-1}w_{2k}}$ is distinctly color-compatible. In contrast, for the [KMSS]-algorithm, the analogous expression for the variance has a term for each $2k$ -tuple of edges regardless of color-compatibility.

For most $2k$ -tuples of edges $(\vec{T_{1}},\vec{T_{2}})$ , the product (8) has expected value 0 and therefore does not contribute to the variance. Here we classify the $2k$ -tuples that do contribute to the variance. Consider some $2k$ -tuple of edges $(\vec{T_{1}},\vec{T_{2}})$ , and consider any vertex $b\in{\cal{V}}(H)$ . We consider three conditions that the $2k$ -tuple may or may not satisfy at $b$ :

Condition 1:

The vertices $\{v_{i}:i\in\Gamma(b)\}$ are all the same, and the vertices $\{w_{i}:i\in\Gamma(b)\}$ are all the same.

Condition 2:

$v_{i}=w_{i}$ for all $i\in\Gamma(b)$ .

Condition 3:

There are vertices $x,y\in{\cal{V}}(\vec{G})$ such that for every $i\in\Gamma(b)$ , either $v_{i}=x$ and $w_{i}=y$ , or $v_{i}=y$ and $w_{i}=x$ .

Note that Condition 1 is a special case of Condition 3. In general, when Condition 1 is satisfied at every vertex of $\vec{H}$ , each of $\vec{T_{1}}$ and $\vec{T_{2}}$ forms a homomorphic image of $\vec{H}$ . In general, when Condition 2 is satisfied at every vertex of $\vec{H}$ , $\vec{T_{1}}$ is an arbitrary collection of $k$ edges, and $\vec{T_{2}}=\vec{T_{1}}$ .

The following lemma turns Conditions 1–3 into conditions on ${\cal{P}}_{b}(\vec{T_{1}})$ and ${\cal{P}}_{b}(\vec{T_{2}})$ . Those conditions will later let us characterize which ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}(\overline{{\cal{Q}}(\vec{T_{2}})})$ contribute to the variance.

Lemma 4.

Suppose $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ is any $2k$ -tuple of edges of $\vec{G}$ .

A.

If $\vec{T}$ satisfies Condition 1 at $b$ , then ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})=I$ . 2. B.

If $\vec{T}$ satisfies Condition 2 at $b$ but not Condition 1, then ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})$ , and each is a uniformly random element of ${\cal{G}}$ . 3. C.

If $\vec{T}$ satisfies Condition 3 at $b$ but not Condition 1, then ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})^{-1}$ , and each is a uniformly random element of ${\cal{G}}$ . 4. D.

If $\vec{T}$ does not satisfy Condition 1,2, or 3 at $b$ , then either ${\cal{P}}_{b}(\vec{T_{1}})$ or ${\cal{P}}_{b}(\vec{T_{2}})$ is a uniformly random element of ${\cal{G}}$ and is independent of the other.

**Proof: **Suppose that $\vec{T_{1}}=(\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}})$ and $\vec{T_{2}}=(\overrightarrow{w_{1}w_{2}},\dots,\overrightarrow{w_{2k-1}w_{2k}})$ . If $\vec{T}$ satisfies Condition 1 at $b$ , then by Lemma 1, ${\cal{P}}_{b}(T_{1})=I$ and ${\cal{P}}_{b}(T_{2})=I$ .

Now suppose that Condition 1 is not satisfied at $b$ . Let $h_{\delta}$ be the distinguished half-edge at $b$ . Then

[TABLE]

so

[TABLE]

Similarly,

[TABLE]

Since Condition 1 is not satisfied, either some $v_{i}\neq v_{\delta}$ or some $w_{i}\neq w_{\delta}$ . Assume it is the former. Then ${\cal{X}}_{i}(v_{i}){\cal{X}}_{i}(v_{\delta})^{-1}$ is a uniformly random element of ${\cal{G}}$ , and it is independent of ${\cal{X}}_{j}(v_{j}){\cal{X}}_{j}(v_{\delta})^{-1}$ for all $j\notin\{i,\delta\}$ , since then neither $h_{i}$ nor $h_{j}$ is distinguished. Thus ${\cal{P}}_{b}(\vec{T_{1}})$ is a uniformly random element of ${\cal{G}}$ . Similarly, if $w_{i}\neq w_{\delta}$ , then ${\cal{P}}_{b}(\vec{T_{2}})$ is a uniformly random element of ${\cal{G}}$ .

If $\vec{T}$ satisfies Condition 2, then for each $i\in\Gamma(b)$ , ${\cal{X}}_{i}(v_{i})={\cal{X}}_{i}(w_{i})$ , so ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})$ .

If $\vec{T}$ satisfies Condition 3, then for each $i\in\Gamma(b)$ , either $v_{i}=v_{\delta}$ and $w_{i}=w_{\delta}$ , or $v_{i}=w_{\delta}$ and $w_{i}=v_{\delta}$ . Either way, ${\cal{X}}_{i}(v_{i}){\cal{X}}_{i}(v_{\delta})^{-1}$ is the inverse of ${\cal{X}}_{i}(w_{i}){\cal{X}}_{i}(w_{\delta})^{-1}$ , so ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})^{-1}$ .

Suppose then that $\vec{T}$ does not satisfy any of the three conditions. Suppose also that for some $i\in\Gamma(b)$ , one of $v_{i}$ , $w_{i}$ , $v_{\delta}$ , and $w_{\delta}$ differs from the other three. Suppose the one that differs is either $v_{i}$ or $v_{\delta}$ . Then ${\cal{X}}_{i}(v_{i}){\cal{X}}_{i}(v_{\delta})^{-1}$ is a uniformly random element of ${\cal{G}}$ , and it is independent of ${\cal{X}}_{i}(w_{i}){\cal{X}}_{i}(w_{\delta})^{-1}$ . It is also independent of ${\cal{X}}_{j}(v_{j}){\cal{X}}_{j}(v_{\delta})^{-1}$ and ${\cal{X}}_{j}(w_{i}){\cal{X}}_{j}(w_{\delta})^{-1}$ for all $j\notin\{i,\delta\}$ . Thus ${\cal{P}}_{b}(\vec{T_{1}})$ is a uniformly random element of ${\cal{G}}$ and is independent of ${\cal{P}}_{b}(\vec{T_{2}})$ . Similarly, if $w_{i}$ or $w_{\delta}$ was the one that differed from the other three, then ${\cal{P}}_{b}(\vec{T_{2}})$ would be uniformly random and independent of ${\cal{P}}_{b}(\vec{T_{1}})$ . Suppose then that for each $i$ , none of $v_{i}$ , $w_{i}$ , $v_{\delta}$ , and $w_{\delta}$ is different from the other three. If $v_{\delta}=w_{\delta}$ , then Condition 2 must hold; whereas if $v_{\delta}\neq w_{\delta}$ , then Condition 3 must hold.

3.1 Variance When ${\cal{G}}$ Consists of Roots of Unity

At this point, the discussion splits into two cases depending on whether ${\cal{G}}$ is a group of roots of unity or a group of matrices. Here we consider the former. Therefore we fix some integer $r\geq 2$ and let ${\cal{G}}$ be the group of 1-by-1 matrices whose entries are $r^{\rm th}$ roots of unity. Since the matrices are 1-by-1, we treat all matrices as complex numbers rather than matrices. Also, since the trace of a 1-by-1 matrix is equal to its entry, we simply remove “ ${\rm tr}$ ” from any equations. Thus the expression (6) for variance becomes

[TABLE]

Since ${\cal{S}}\overline{{\cal{S}}}$ is a sum of terms of the form ${\cal{Q}}(\vec{T_{1}})\overline{{\cal{Q}}(\vec{T_{2}})}$ , the next theorem classifies which pairs $(\vec{T_{1}},\vec{T_{2}})$ contribute to $E({\cal{S}}\overline{{\cal{S}}})$ .

Theorem 2.

Let $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ be a $2k$ -tuple of edges of $\vec{G}$ . If either of the following hold:

•

$\vec{T}$ * satisfies Condition 1 or 2 for every $b\in{\cal{V}}(\vec{H})$ , or*

•

$r=2$ , and $\vec{T}$ satisfies Condition 1, 2, or 3 for every $b\in{\cal{V}}(\vec{H})$ ,

then ${\cal{Q}}(\vec{T_{1}})\overline{{\cal{Q}}(\vec{T_{2}})}=1$ . Otherwise,

[TABLE]

**Proof: **We can write

[TABLE]

as

[TABLE]

If $\vec{T}$ satisfies Condition 1 or 2 at some $b$ , then by Lemma 4, ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})$ , so

[TABLE]

If $r=2$ and $\vec{T}$ satisfies Condition 3 at some $b$ , then ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})^{-1}$ , so

[TABLE]

Thus if either of these two conditions holds at every $b$ , then

[TABLE]

Suppose now that at some $b$ , Conditions 1 and 2 don’t hold. If Condition 3 holds and $r>2$ , then by Lemma 4, ${\cal{P}}_{b}(\vec{T_{1}})={\cal{P}}_{b}(\vec{T_{2}})^{-1}$ , and each is a uniformly random element of ${\cal{G}}$ . Then ${\cal{P}}_{b}(\vec{T_{1}})\overline{{\cal{P}}_{b}(\vec{T_{2}})}={\cal{P}}_{b}(\vec{T_{1}})^{2}$ , and since $r>2,$ we have

[TABLE]

Thus

[TABLE]

If instead Condition 3 does not hold, then by Lemma 4, one of ${\cal{P}}_{b}(\vec{T_{1}})$ and ${\cal{P}}_{b}(\vec{T_{2}})$ is a uniformly random $r^{\rm th}$ root of unity and is independent of the other, so again

[TABLE]

In either case, ${\cal{P}}_{b}(\vec{T_{1}})\overline{{\cal{P}}_{b}(\vec{T_{2}})}$ is independent of ${\cal{P}}_{c}(\vec{T_{1}})\overline{{\cal{P}}_{c}(\vec{T_{2}})}$ for $c\in{\cal{V}}(\vec{H})\setminus b$ , so

[TABLE]

3.2 Variance When ${\cal{G}}=\{\pm I,\pm M,\dots,\pm M^{d-1}\}$

Now we consider the variance of our estimate in the case where ${\cal{G}}$ is a group of matrices. In particular, fix a dimension $d\geq 2$ and let ${\cal{G}}$ consist of the matrices $\{\pm I,\pm M,\dots,\pm M^{d-1}\}$ , where $M$ is the diagonal matrix with entries $1,\omega,\omega^{2},\dots,\omega^{d-1}$ , and $\omega=e^{2\pi i/d}$ .

The variance of our estimate for $\#H$ is given by Expression (6). Note, however, that the trace of every element of ${\cal{G}}$ is real, so we can dispense with complex conjugation. Thus the variance becomes

[TABLE]

We thus wish to understand the term $E({\rm tr}({\cal{S}})^{2})$ . Since ${\rm tr}({\cal{S}})^{2}$ is a sum of terms of the form ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))$ , the next theorem classifies how much each pair $(\vec{T_{1}},\vec{T_{2}})$ contributes to $E({\rm tr}({\cal{S}})^{2})$ .

Theorem 3.

Suppose $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ is a $2k$ -tuple of edges of $\vec{G}$ .

•

If $\vec{T}$ satisfies Condition 1 for every $b\in{\cal{V}}(\vec{H})$ , then ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=d^{2}$ .

•

If $\vec{T}$ satisfies either Condition 1, 2, or 3 at every $b\in{\cal{V}}(\vec{H})$ but not always Condition 1, then

[TABLE]

•

Otherwise,

[TABLE]

**Proof: **Suppose that Condition 1, 2, or 3 holds at every $b\in{\cal{V}}(\vec{H})$ . Recall that ${\cal{Q}}(\vec{T_{1}})=\prod_{b\in{\cal{V}}(\vec{H})}{\cal{P}}_{b}(\vec{T_{1}})$ , and ${\cal{Q}}(\vec{T_{2}})=\prod_{b\in{\cal{V}}(\vec{H})}{\cal{P}}_{b}(\vec{T_{2}})$ . Let $R_{1}$ denote the product of ${\cal{P}}_{b}(\vec{T_{1}})$ over all $b\in{\cal{V}}(\vec{H})$ where Condition 1 holds. Let $R_{2}$ denote the same product at all $b\in{\cal{V}}(\vec{H})$ where Condition 2 holds, but not Condition 1. Let $R_{3}$ denote the same product over all $b\in{\cal{V}}(\vec{H})$ where Condition 3 holds, but not Condition 1. (In each case, if the given conditions are not satisfied at any $b$ , then define $R_{i}$ to be $I$ .) Thus ${\cal{Q}}(\vec{T_{1}})=R_{1}R_{2}R_{3}$ . By Lemma 4-A, $R_{1}=I$ , so ${\cal{Q}}(\vec{T_{1}})=IR_{2}R_{3}$ . By Lemmas 4-B and 4-C, ${\cal{Q}}(\vec{T_{2}})=IR_{2}R_{3}^{-1}$ . Furthermore, if there is at least one $b$ where Condition 2 (resp. 3) holds but not Condition 1, then $R_{2}$ (resp. $R_{3}$ ) is uniformly random. Finally, since $R_{2}$ and $R_{3}$ involve different vertices of $H$ , they are independent.

If $T$ satisfies Condition 1 at every $b\in{\cal{V}}(\vec{H})$ , then $R_{2}=R_{3}=I$ , so ${\cal{Q}}(\vec{T_{1}})={\cal{Q}}(\vec{T_{2}})=I$ , and ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=d^{2}$ .

If $T$ satisfies Condition 1 or 2 at every $b\in{\cal{V}}(\vec{H})$ but not always Condition 1, then $R_{3}=I$ , and ${\cal{Q}}(\vec{T_{1}})={\cal{Q}}(\vec{T_{2}})=R_{2}$ . With probability $1/d$ , $R_{2}=\pm I$ , in which case ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=d^{2}$ . If $R_{2}$ is not $\pm I$ , then ${\rm tr}(R_{2})=0$ , so ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=0$ . Thus

[TABLE]

If $T$ satisfies Condition 1 or 3 at every $b\in{\cal{V}}(\vec{H})$ but not always Condition 1, then ${\cal{Q}}(\vec{T_{1}})=R_{3}$ , and ${\cal{Q}}(\vec{T_{2}})=R_{3}^{-1}$ . With probability $1/d$ , $R_{3}=\pm I$ , in which case ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=d^{2}$ . If $R_{3}$ is not $\pm I$ , then ${\rm tr}(R_{3})=0$ , so ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=0$ . Thus

[TABLE]

Next, suppose $\vec{T}$ satisfies Condition 1, 2, or 3 at every $b\in{\cal{V}}(\vec{H})$ but not always Condition 1 or 2, and not always Condition 1 or 3. Then ${\cal{Q}}(\vec{T_{1}})=R_{2}R_{3}$ and ${\cal{Q}}(\vec{T_{2}})=R_{2}R_{3}^{-1}$ . If either of $R_{2}R_{3}$ or $R_{2}R_{3}^{-1}$ is not $\pm I$ , then it has trace 0, in which case ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}({\cal{Q}}(\vec{T_{2}}))=0$ . Thus we only need to consider the cases where $R_{2}R_{3}$ and $R_{2}R_{3}^{-1}$ are both $\pm I$ ; or equivalently, the case where $R_{2}=\pm R_{3}$ and $R_{2}^{2}=I$ . This happens with probability $1/d^{2}$ if $d$ is odd and $2/d^{2}$ if $d$ is even. Thus

[TABLE]

is equal to 1 if $d$ is odd, and 2 if $d$ is even.

Finally, suppose that $\vec{T}$ does not satisfy any of Conditions 1, 2, or 3 at some vertex $b\in{\cal{V}}(\vec{H})$ . Then by Lemma 4-D, one of ${\cal{Q}}(\vec{T_{1}})$ and ${\cal{Q}}(\vec{T_{2}})$ is a uniformly random element of ${\cal{G}}$ , and is independent of the other. Thus

[TABLE]

3.3 Bounding the Variance

As we saw in Sections 3.1 and 3.2, a $2k$ -tuple of edges only contributes to the variance if it is distinctly color-compatible and satisfies Condition 1, 2, or 3 at every vertex of $H$ . Now we bound the number of $2k$ -tuples with these properties to get a bound on the variance.

Throughout this section, $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ will denote a $2k$ -tuple of edges of $\vec{G}$ , where $\vec{T_{1}}=\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}}$ and $\vec{T_{2}}=\overrightarrow{w_{1}w_{2}},\dots,\overrightarrow{w_{2k-1}w_{2k}}$ . We continue to refer to the edges of $\vec{H}$ as $\overrightarrow{a_{1}a_{2}},\dots,\overrightarrow{a_{2k-1}a_{2k}}$ . In much of this section, edge-directions will be irrelevant and will often be ignored. We refer to the two halves of the edge $\overline{a_{2i-1},a_{2i}}$ as the “half-edge at $a_{2i-1}$ ” and the “half-edge at $a_{2i}$ ,” and similarly for the two halves of $\overline{v_{2i-1},v_{2i}}$ and $\overline{w_{2i-1},w_{2i}}$ . Let $K$ denote the undirected subgraph of $G$ consisting of the $2k$ edges of $\vec{T}$ , ignoring edge-directions. If $i\in\Gamma(b)$ (so $a_{i}=b$ ), then we say that $v_{i}$ and $w_{i}$ lie over $b$ . Thus, for instance, Condition 1 is satisfied at some $b\in H$ if and only if all $v_{i}$ that lie over $b$ are equal and all $w_{i}$ that lie over $b$ are equal. If $i\in\Gamma(b)$ and $\vec{T}$ satisfies Condition 1 (resp. 2 or 3) at $b$ , then we’ll say also that $\vec{T}$ satisfies Condition 1 (resp. 2 or 3) at $v_{i}$ and at $w_{i}$ .

Suppose $b$ and $c$ are distinct vertices of $H$ , and suppose $i\in\Gamma(b)$ and $j\in\Gamma(c)$ . The vertices $v_{i}$ and $v_{j}$ need not be distinct; however, if they are not distinct, then $\vec{T_{1}}$ cannot be distinctly color-compatible (because that would require that $v_{i}$ and $v_{j}$ get different colors). And similarly for $w_{i}$ , $w_{j}$ , and $\vec{T_{2}}$ . Thus if we are given $\vec{T}$ but we are not yet given the colors of the vertices, then we will say that $\vec{T}$ is distinctly colorable if for all distinct vertices $b,c\in H$ , and for all $i\in\Gamma(b)$ and $j\in\Gamma(c)$ , the vertices $v_{i}$ and $v_{j}$ are distinct, as are the vertices $w_{i}$ and $w_{j}$ (though $v_{i}$ and $w_{j}$ are not required to be distinct). If $\vec{T}$ is not distinctly colorable, then no matter how colors are assigned to vertices, $\vec{T}$ will not be distinctly color-compatible and therefore will not contribute to the variance.

We begin with some lemmas.

Lemma 5.

Suppose $i\in\Gamma(b)$ and $j\notin\Gamma(b)$ , where $b$ is some vertex of $H$ . If $\vec{T}$ is distinctly colorable and Condition 2 or 3 is satisfied at $b$ , but not Condition 1, then neither $v_{i}$ nor $w_{i}$ can be equal to either $v_{j}$ or $w_{j}$ .

**Proof: **If Condition 2 holds at $b$ , then $v_{i}=w_{i}$ . By the definition of “distinctly colorable,” $v_{i}\neq v_{j}$ and $w_{i}\neq w_{j}$ , so neither $v_{j}$ nor $w_{j}$ can equal $v_{i}=w_{i}$ .

If instead Condition 3 holds at $b$ , then there are two vertices $x$ and $y$ that lie over $b$ in $K$ such that for all $i^{\prime}\in\Gamma(b)$ , either $v_{i^{\prime}}=x$ and $w_{i^{\prime}}=y$ or vice versa. We may assume that $v_{i}=x$ and $w_{i}=y$ . But since Condition 1 does not hold at $b$ , there must also be some $i^{\prime}\in\Gamma(b)$ such that $v_{i^{\prime}}=y$ and $w_{i^{\prime}}=x$ . Since $v_{i}=x=w_{i^{\prime}}$ , it follows from the definition of “distinctly colorable” that $v_{i}$ cannot be equal to either $v_{j}$ or $w_{j}$ , and similarly for $w_{i}$ .

Normally, we refer to the edges of $\vec{H}$ as $\overrightarrow{a_{1}a_{2}},\dots,\overrightarrow{a_{2k-1}a_{2k}}$ ; however, in the next lemma, we will not be concerned with the directions of the edges, so we will refer to the edges as $\overline{a_{\alpha}a_{\beta}}$ , with the understanding that for some $1\leq r\leq k$ , either $\alpha=2r-1$ and $\beta=2r$ , or vice versa.

Lemma 6.

Suppose $W=\overline{a_{\alpha_{1}}a_{\beta_{1}}}\dots\overline{a_{\alpha_{s}}a_{\beta_{s}}}$ is a walk in the undirected graph $H$ , and suppose that Condition 1 or Condition 3 holds at every internal vertex of the walk (i.e., Condition 1 or 3 holds at each vertex $a_{\beta_{i}}=a_{\alpha_{i+1}}$ for $1\leq i<s$ ). Then there is a walk in $K$ from $v_{\alpha_{1}}$ to either $v_{\beta_{s}}$ or $w_{\beta_{s}}$ , and similarly for $w_{\alpha_{1}}$ .

**Proof: **Let $e_{i}$ denote the $i^{\rm th}$ edge of $W$ ; in other words, $e_{i}=\overline{a_{\alpha_{i}}a_{\beta_{i}}}$ . Then $e_{i}$ has two “lifts” in $K$ , namely, $\overline{v_{\alpha_{i}}v_{\beta_{i}}}$ and $\overline{w_{\alpha_{i}}w_{\beta_{i}}}$ . We will show that each lift of $e_{i}$ is adjacent to a lift of $e_{i+1}$ , so we will be able to piece together lifts of the $e_{i}$ ’s to get a lift of the entire walk.

We use induction on the length $s$ of $W$ . The proof is the same for $v_{\alpha_{1}}$ and $w_{\alpha_{1}}$ , so we present the proof just for $v_{\alpha_{1}}$ . If $s=1$ , then $\overline{v_{\alpha_{1}}v_{\beta_{1}}}$ is the required walk. If $s>1$ , then by induction, there is a walk $U$ in $K$ from $v_{\alpha_{1}}$ to either $v_{\beta_{s-1}}$ or $w_{\beta_{s-1}}$ . Since $W$ is a walk, the edges $e_{s-1}$ and $e_{s}$ are adjacent; in particular, the vertices $a_{\beta_{s-1}}$ and $a_{\alpha_{s}}$ are equal. Equivalently, there is some vertex $b$ of $H$ such that $\beta_{s-1},\alpha_{s}\in\Gamma(b)$ . By assumption, Condition 1 or 3 holds at $b$ , so either $v_{\beta_{s-1}}=v_{\alpha_{s}}$ and $w_{\beta_{s-1}}=w_{\alpha_{s}}$ , or $v_{\beta_{s-1}}=w_{\alpha_{s}}$ and $w_{\beta_{s-1}}=v_{\alpha_{s}}$ . Either way, we can append either the edge $\overline{v_{\alpha_{s}}v_{\beta_{s}}}$ or the edge $\overline{w_{\alpha_{s}}w_{\beta_{s}}}$ to $U$ , obtaining a walk from $v_{\alpha_{1}}$ to either $v_{\beta_{s}}$ or $w_{\beta_{s}}$ .

Although $H$ is connected, $K$ need not be. For instance, $K$ might consist of two isomporphic copies of $H$ . In that case, each connected component of $K$ contains a lift of every edge of $H$ . However, it can also happen that a connected component of $K$ contains lifts of only some edges of $H$ . The next lemmas involve the connected components of $K$ . We generally use $J$ to denote a connected component of $K$ and use $J^{\prime}$ to denote the subgraph of $H$ that lies “below” $J$ . Note that $H$ is connected, so $J^{\prime}$ is not generally a connected component of $H$ .

Lemma 7.

Suppose that $\vec{T}$ satisfies Condition 1, 2, or 3 at each vertex of $H$ , and suppose it satisfies Condition 2 at some vertex. Then for every $i\in\{1,\dots,2k\}$ , the two vertices $v_{i}$ and $w_{i}$ are in the same connected component of $K$ . Furthermore, that component also contains some vertex at which Condition 2 is satisfied.

Proof: $H$ is connected, so there is a walk that starts with the half-edge at $a_{i}$ and ends at a vertex where Condition 2 holds. We can choose a minimal such walk, in which case it has no internal vertices where Condition 2 holds. Suppose the walk ends with the half-edge at $a_{j}$ . By Lemma 6, there is a walk in $K$ from $v_{i}$ to either $v_{j}$ or $w_{j}$ , and also a walk from $w_{i}$ to either $v_{j}$ or $w_{j}$ . But Condition 2 holds at $a_{j}$ , so $v_{j}=w_{j}$ . Thus there is a walk from $v_{i}$ to $v_{j}$ , and one from $w_{i}$ to $v_{j}$ . Concatenating them gives a walk from $v_{i}$ to $w_{i}$ . Thus $v_{i}$ and $w_{i}$ are in the same component, and are also in the same component as the vertex $v_{j}$ , at which Condition 2 holds.

Lemma 8.

Suppose $\vec{T}$ satisfies Condition 1, 2, or 3 at every vertex of $H$ and satisfies Condition 2 at some vertex of $H$ . Let $J$ be any connected component of $K$ , and define $J^{\prime}$ to be the subgraph of $H$ consisting of all edges $\overline{a_{2i-1}a_{2i}}$ for which either $\overline{v_{2i-1}v_{2i}}$ or $\overline{w_{2i-1}w_{2i}}$ is in $J$ . Then $J^{\prime}$ must contain either

•

at least two vertices where $\vec{T}$ satisfies Condition 2;

•

a vertex with degree at least 2 in $J^{\prime}$ and where $\vec{T}$ satisfies Condition 2;

•

a vertex with degree at least 3 in $H$ and where $\vec{T}$ satisfies Condition 1 or 3.

**Proof: **We assumed there is some vertex of $H$ where Condition 2 is satisfied, so by Lemma 7, $J$ contains such a vertex, and so then does $J^{\prime}$ . If $J^{\prime}$ contains two such vertices, then we are done, so assume there is just one. If that one vertex has degree at least 2 in $J^{\prime}$ , then again we are done, so assume it has degree 1. By the Handshaking Lemma, there must be another vertex with odd degree in $J^{\prime}$ ; and $\vec{T}$ must satisfy either Condition 1 or 3 at that vertex.

If $b$ is any vertex in $J^{\prime}$ , then for some $i\in\Gamma(b)$ , the half-edge at $a_{i}$ is in $J^{\prime}$ , so either the half-edge at $v_{i}$ or the half-edge at $w_{i}$ is in $J$ . If in addition $\vec{T}$ satisfies either Condition 1 or 3 at $b$ , then (by the definition of Conditions 1 and 3), for every $i\in\Gamma(b)$ , either the half-edge at $v_{i}$ or the half-edge at $w_{i}$ is in $J$ . Therefore, for every $i\in\Gamma(b)$ , the half-edge at $a_{i}$ is in $J^{\prime}$ . In other words, $b$ has the same degree in $J^{\prime}$ as in $H$ . We saw in the previous paragraph that some vertex satisfies either Condition 1 or 3 and has odd degree in $J^{\prime}$ . It has the same degree in $H$ . But we assumed that $H$ has no leaves, so it must have degree at least 3 in $H$ .

Lemma 9.

Suppose $\vec{T}$ satisfies Condition 1, 2, or 3 at every vertex of $H$ and satisfies Condition 2 at some vertex $b$ of $H$ . Let $\delta$ denote the degree of $b$ in $H$ . For each connected component $J$ of $K$ , define $J^{\prime}$ as in the previous lemma. Suppose there are $\kappa$ components $J_{1},\dots,J_{\kappa}$ of $K$ that satisfy:

•

$J_{i}$ * has at most one vertex where Condition 2 holds, and*

•

$b$ * has degree at least two in $J_{i}^{\prime}$ .*

Then at most $\delta-\kappa$ distinct vertices of $K$ lie over $b$ .

**Proof: **Suppose $\Gamma(b)=\{s_{1},\dots,s_{\delta}\}$ , so the vertices $a_{s_{1}},\dots,a_{s_{\delta}}$ are all equal to $b$ . The vertices that lie above $b$ in $K$ are $v_{s_{1}},\dots,v_{s_{\delta}}$ and $w_{s_{1}},\dots,w_{s_{\delta}}$ . Since Condition 2 holds at $b$ , $v_{s_{j}}=w_{s_{j}}$ for each $j$ , so in fact the vertices that lie above $b$ in $K$ are just $v_{s_{1}},\dots,v_{s_{\delta}}$ . Consider any $J_{i}^{\prime}$ as defined in the lemma. Since $b$ has degree at least two in $J_{i}^{\prime}$ , at least two of the half-edges at $a_{s_{1}},\dots,a_{s_{\delta}}$ are in $J_{i}^{\prime}$ . Assume without loss of generality that the half-edges at $a_{s_{1}}$ and $a_{s_{2}}$ are in $J_{i}^{\prime}$ . Then $J_{i}$ contains the half-edge at either $v_{s_{1}}$ or $w_{s_{1}}$ and the half-edge at either $v_{s_{2}}$ or $w_{s_{2}}$ . Since $v_{s_{1}}=w_{s_{1}}$ and $v_{s_{2}}=w_{s_{2}}$ , $J_{i}$ contains both $v_{s_{1}}$ and $v_{s_{2}}$ . But $J_{i}$ has at most one vertex where Condition 2 holds, so $v_{s_{1}}$ and $v_{s_{2}}$ must be the same vertex. Thus each of $J_{1},\dots,J_{\kappa}$ contains two of $v_{s_{1}},\dots,v_{s_{\delta}}$ that are equal, so there can be at most $\delta-\kappa$ that are distinct.

Lemma 10.

Suppose $0<\Delta\leq m^{1/2-\alpha}$ , where $\alpha>0$ , and suppose $0<C\leq\min(m^{2\alpha},m^{1/3})$ . Then $\Delta^{2}\leq m/C$ and $\Delta\leq m/C^{2}$ .

**Proof: **The first inequality follows from $\Delta^{2}C\leq m^{1-2\alpha}m^{2\alpha}\leq m$ . For the second inequality, if $\alpha\geq 1/6$ , then both $C$ and $\Delta$ are at most $m^{1/3}$ , so $\Delta C^{2}\leq m$ ; if instead $\alpha\leq 1/6$ , then $\Delta C^{2}\leq m^{1/2-\alpha}(m^{2\alpha})^{2}=m^{1/2+3\alpha}\leq m$ .

Theorem 4.

Suppose that the maximum degree $\Delta$ of any vertex in $G$ is at most $m^{1/2-\alpha}$ , where $\alpha>0$ , and assume $C\leq\min(m^{2\alpha},m^{1/3})$ . Then the expected number of distinctly color-compatible $2k$ -tuples of edges of $G$ that satisfy either Condition 1, 2, or 3 at every vertex of $H$ , and satisfy Condition 2 at some vertex of $H$ is $O(m^{k}/C^{2k-t})$ .

**Proof: **Let $\vec{T}$ and $K$ be as defined above. There are $O(1)$ possibilities for the isomorphism class of $K$ (i.e., which of the vertices $v_{1},\dots,v_{2k},w_{1},\dots,w_{2k}$ are the same), so it suffices to prove the theorem for an arbitrary isomorphism class. Consider then any one such class. We may assume that it is distinctly colorable.

The expected number of possibilities for $\vec{T}$ can be computed in two steps: first count the number of ways to select the vertices of $\vec{T}$ where colors are ignored, and then find the probability that when colors are assigned, $\vec{T}$ becomes distinctly color-compatible. (When we say “select the vertices of $\vec{T}$ ,” we mean, choose a vertex of $G$ for each $v_{i}$ and $w_{i}$ so that the resulting $\vec{T}$ has the assumed isomorphism class.) To count the number of ways to select the vertices for $\vec{T}$ , we consider one connected component of $K$ at a time. Let $J$ be some connected component of $K$ . We can arbitrarily designate any one edge of $J$ to be the “first edge.” Once we designate the first edge, there are at most $m$ ways to select its two endpoints (since $\vec{G}$ has $m$ edges). There are then at most $\Delta$ ways to select each subsequent vertex of $J$ , for a total of $m\Delta^{|{\cal{V}}(J)|-2}$ . Equivalently, we could have arbitrarily designated any two (not necessarily adjacent) vertices of $J$ to be the “first two vertices;” we could have then pretended that there were at most $\sqrt{m}$ ways to select each of those two vertices and at most $\Delta$ ways to select each other vertex of $J$ .

For a component $J$ , we use the following method to decide which will be its first two vertices. Let $J^{\prime}$ be the subgraph of $H$ consisting of all edges $\overline{a_{2i-1}a_{2i}}$ for which either $\overline{v_{2i-1}v_{2i}}$ or $\overline{w_{2i-1}w_{2i}}$ is in $J$ (as in Lemma 8). By Lemma 7, $J$ has at least one vertex where Condition 2 is satisfied. We’ll designate that as one of the first two vertices of $J$ . If there is a second such vertex, then we’ll designate it as the other. If not, if some vertex of $J$ lies above a vertex of $H$ that has degree at least 3, and where Condition 1 or 3 is satisfied, then we’ll designate that as the other. Otherwise, we’ll designate any vertex as the other.

We now show that the result is at most $m^{k}/C^{2k-t}$ . We consider one vertex $b$ of $H$ at a time, and compute the factor that the vertices that lie above $b$ contribute to the result. Recall that if some vertex above $b$ was designated as one of the first two vertices in its component, then it contributes a factor of $\sqrt{m}$ , and otherwise contributes a factor of $\Delta$ . Furthermore, if Condition 2 holds at $b$ , and if there are $d$ vertices that lie above $b$ , then they must all receive the same color, which introduces a factor of $C^{1-d}$ . Note also that by Lemma 5, if $b_{i}$ and $b_{j}$ are two vertices of $H$ where Condition 2 holds, then all the vertices that lie above $b_{i}$ are distinct from all the vertices that lie above $b_{j}$ , and so these factors of $C^{1-d}$ are all independent. We consider four cases for $b$ . In the first case, Condition 2 holds at $b$ . In the other three cases, Condition 1 or 3 holds at $b$ , but we subdivide these cases based on whether some vertex that lies above $b$ was designated as a first vertex of its component, and whether $b$ has degree $>2$ .

First consider the case where Condition 2 holds at $b$ . Let $d$ denote the number of vertices of $K$ that lie above $b$ . There are at most $\sqrt{m}$ ways to select each of these $d$ vertices, and they must all receive the same color, so these vertices contribute at most a factor of $m^{d/2}/C^{d-1}$ to the count. By Lemma 9, $d$ is at most $\delta-\kappa$ , where $\delta$ is the degree of $b$ in $H$ , and $\kappa$ is the number of connected components $J$ of $K$ that satisfy: $J$ has at most one vertex where Condition 2 holds, and $b$ has degree at least two in $J^{\prime}$ . Thus the contribution of $b$ to the overall expected value is at most a factor of

[TABLE]

For the next three cases, suppose that either Condition 1 or 3 holds at $b$ . Then at most two vertices of $K$ lie above $b$ , and by Lemma 7, they lie in the same component of $K$ . In the case where neither was designated as one of the first two vertices of that component, their contribution to the count is at most a factor of

[TABLE]

(We used Lemma 10 in the first inequality.)

For the last two cases, suppose that one of the vertices that lie above $b$ was designated as one of the first two vertices of the component. First assume $\delta\geq 3$ . Then the contribution of the vertices that lie above $b$ to the overall expected value is at most a factor of

[TABLE]

(We used Lemma 10 in the first inequality.)

Finally, suppose that one of the vertices that lie above $b$ was designated as one of the first two vertices of the component, and $\delta<3$ (which means that $\delta=2$ ). Then the contribution of the vertices that lie above $b$ to the overall expected value is at most a factor of

[TABLE]

(We used Lemma 10 in the last inequality.)

Observe that in all four cases (Equations (11), (12), (13), and (14)), the vertex $b$ contributed a factor of $m^{\delta/2}/C^{\delta-1}$ , except that in (11) and (14), there are additional factors of $\sqrt{m}/C$ or $C/\sqrt{m}$ . We first show that there are at least as many factors of $C/\sqrt{m}$ as $\sqrt{m}/C$ . For the remainder of the proof, we use $\delta(b)$ rather than $\delta$ to denote the degree of $b$ , since $b$ will no longer be clear from context. There is one factor of $\sqrt{m}/C$ in Equation (14) for each vertex $b$ and component $J$ such that:

•

$\delta(b)<3$ ,

•

$b$ satisfies Condition 1 or 3, and

•

a vertex $x$ that lies above $b$ was designated as one of the first two vertices of $J$ .

Observe that $J$ can have at most one vertex that satisfies Condition 1 or 3 and was designated as one of the first two vertices, so $J$ cannot contribute a factor of $\sqrt{m}/C$ for any vertex besides $b$ . In other words, $J$ contributes at most one factor of $\sqrt{m}/C$ overall. Now we’ll show that $J$ also contributes a factor of $C/\sqrt{m}$ to (11). Since $x$ was designated as one of the first two vertices of $J$ , we know that $J$ has only one vertex where Condition 2 holds (which, by Lemma 5, implies that $J^{\prime}$ has only one vertex where Condition 2 holds), and $J$ cannot have a vertex that lies above a vertex with degree at least 3 in $H$ and where Condition 1 or 3 holds. Thus by Lemma 8, $J^{\prime}$ must have a vertex with degree at least 2 (in $J^{\prime}$ ) where Condition 2 holds. Then for that vertex, $J$ contributes a factor of $C/\sqrt{m}$ to (11). Thus there must be at least as many factors of $C/\sqrt{m}$ in (11) as there are factors of $\sqrt{m}/C$ in (14). We can therefore ignore all such factors; this can only increase the product. When we ignore these factors, each vertex $b$ of $H$ contributes a factor of at most $m^{\delta(b)/2}/C^{\delta(b)-1}$ . Taking the product over $b$ gives

[TABLE]

Next, we consider what happens when no vertex of $\vec{T}$ satisfies Condition 2.

Lemma 11.

Suppose $\vec{T}$ satisfies Condition 1 or 3 at every vertex of $H$ . Then $K$ has at most two connected components.

Proof: $H$ is connected, so for any $i$ , there is a walk from $a_{i}$ to $a_{1}$ . Since Condition 1 or 3 holds at every vertex along the walk, we can apply Lemma 6 to deduce that there is a walk in $K$ from $v_{i}$ to either $v_{1}$ or $w_{1}$ and also a walk from $w_{i}$ to either $v_{1}$ or $w_{1}$ . Thus every vertex of $K$ is in the same connected component as either $v_{1}$ or $w_{1}$ .

Lemma 12.

Suppose $\vec{T}$ satisfies Condition 1 or 3 at every vertex of $H$ , but not always Condition 1. Then at least one of the following must hold:

•

$H$ * has more edges than vertices;*

•

there are at least two vertices of $H$ where $\vec{T}$ does not satisfy Condition 1;

•

$K$ * is connected (as an undirected graph).*

**Proof: **We assumed that $H$ is connected and has no leaves. Suppose the first condition above does not hold (i.e., $H$ has as many vertices as edges). Then $H$ must be a cycle. Now suppose that the second condition above also does not hold, i.e., there is exactly one vertex of $H$ where $\vec{T}$ does not satisfy Condition 1 (and therefore satisfies Condition 3). We will assume that the edges of $H$ going around the cycle in order are $\overrightarrow{a_{1}a_{2}},\overrightarrow{a_{3}a_{4}},\dots,\overrightarrow{a_{2t-1}a_{2t}}$ . Note that there is no loss of generality in this assumption, because edge-directions are irrelevant to this lemma. We can also assume that the vertex $a_{1}=a_{2t}$ is the vertex where Condition 3 is satisfied, but not Condition 1. Then $v_{1}=w_{2t}$ , and $w_{1}=v_{2t}$ . Since Condition 1 holds everywhere else, we have $v_{2i}=v_{2i+1}$ and $w_{2i}=w_{2i+1}$ for all $1\leq i<t$ . Thus $\overline{v_{1}v_{2}},\overline{v_{3}v_{4}},\dots,\overline{v_{2t-1}v_{2t}},\overline{w_{1}w_{2}},\overline{w_{3}w_{4}},\dots,\overline{w_{2t-1}w_{2t}}$ is a path that visits every vertex of $K$ , so $K$ is connected.

Theorem 5.

Suppose that the maximum degree $\Delta$ of any vertex in $G$ is at most $m^{1/2-\alpha}$ , where $\alpha>0$ , and assume $C\leq\min(m^{2\alpha},m^{1/3})$ . Then the expected number of distinctly color-compatible $2k$ -tuples of edges of $G$ that satisfy either Condition 1 or 3 at every vertex of $H$ , but do not satisfy Condition 1 at every vertex of $H$ , is $O(m^{k}/C^{2k-t})$ .

**Proof: **There are again $O(1)$ possibilities for the isomorphism class of $K$ (i.e., which of the vertices $v_{1},\dots,v_{2k},w_{1},\dots,w_{2k}$ are the same), so it suffices to prove the theorem for an arbitary isomorphism class. Assume then that we are given the isomporphism class of $K$ . We can assume that it is distinctly colorable.

We first count the number of ways to select the vertices of $K$ . By Lemma 11, $K$ has at most two components. As in the proof of Theorem 4, in each component, we can arbitrarily designate any one edge to be the “first edge.” There are at most $m$ ways to select its two endpoints (since $\vec{G}$ has $m$ edges), and there are at most $\Delta$ ways to select each subsequent vertex in the component. Equivalently, we can arbitrarily designate any two (not necessarily adjacent) vertices of the component to be the “first two vertices;” we can then pretend that there are at most $\sqrt{m}$ ways to select each of these two vertices and at most $\Delta$ ways to select each subsequent vertex. Since $K$ has at most $2t$ vertices (because at most two vertices lie above each vertex of $H$ ), there is a total of at most $m\Delta^{2t-2}$ possibilities for the vertices of $K$ if $K$ has one component, and $m^{2}\Delta^{2t-4}$ possibilities if $K$ has two. Note that the number is greater in the two-component case.

Once the vertices of $K$ are chosen, colors must be assigned in such a way that the $2k$ -tuple of edges is distinctly color-compatible. Consider any vertex $b$ of $H$ where Condition 3 (but not Condition 1) holds. That means that exactly two vertices $x$ and $y$ lie above $b$ in $K$ , and they must be distinct (or else Condition 1 would hold). Color-compatibility requires that $x$ and $y$ be assigned the same color, which happens with probability $1/C$ . Furthermore, if there are two vertices $b_{i}$ and $b_{j}$ where Condition 3 (but not Condition 1) holds, and if $x_{i}$ , $y_{i}$ , $x_{j}$ , and $y_{j}$ are the corresponding vertices that lie above $b_{i}$ and $b_{j}$ , then by Lemma 5, $x_{i}$ , $y_{i}$ , $x_{j}$ , and $y_{j}$ are all distinct. Thus every vertex where Condition 3 (but not Condition 1) holds contributes an independent factor of $1/C$ to the count.

Suppose now that $H$ has more edges than vertices (i.e., $k-t\geq 1$ ). There are at most $m^{2}\Delta^{2t-4}$ ways to select the vertices of $K$ and a probability of at most $1/C$ that the result is distinctly color-compatible, so (using Lemma 10) the expected number of $2k$ -tuples of edges is at most

[TABLE]

Next suppose instead that $H$ has at least two vertices where Condition 3 (but not Condition 1) holds. Then there are at most $m^{2}\Delta^{2t-4}$ ways to select the vertices of $K$ and a probability of at most $1/C^{2}$ that the result is distinctly color-compatible, so the expected number of $2k$ -tuples of edges is at most

[TABLE]

The only remaining case is where $H$ does not have more edges than vertices and $H$ has only one vertex where Condition 3 (but not Condition 1) holds. By Lemma 12, $K$ is connected, i.e., has only one component. Then there are at most $m\Delta^{2t-2}$ ways to select the vertices of $K$ and a probability of at most $1/C$ that the result is distinctly color-compatible, so the expected number of $2k$ -tuples of edges is at most

[TABLE]

Lemma 13.

If the $2k$ -tuple of edges $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ is distinctly colorable and satisfies Condition 1 at every vertex of $H$ , then $\vec{T_{1}}$ and $\vec{T_{2}}$ are each isomorphic to $\vec{H}$ .

**Proof: **Suppose $b$ and $c$ are (not necessarily distinct) vertices of $H$ , and suppose $i\in\Gamma(b)$ and $j\in\Gamma(c)$ . Since Condition 1 holds everywhere, if $b=c$ , then $v_{i}=v_{j}$ . Since $\vec{T}$ is distinctly colorable, if $b\neq c$ , then $v_{i}\neq v_{j}$ . In other words, $v_{i}=v_{j}$ if and only if $b=c$ . Then the edge map that sends each $\overrightarrow{v_{2i-1}v_{2i}}$ to $\overrightarrow{a_{2i-1}a_{2i}}$ induces an isomorphism between $\vec{T_{1}}$ and $\vec{H}$ . The proof for $\vec{T_{2}}$ is analogous.

Theorem 6.

Suppose ${\cal{G}}$ is either the group of $r^{\rm th}$ roots of unity (in which case $d=1$ ) or the group $\{\pm I,\pm M,\pm M^{2},\dots,\pm M^{d-1}\}$ . Suppose that the maximum degree $\Delta$ of any vertex in $G$ is at most $m^{1/2-\alpha}$ , where $\alpha>0$ , and assume $C\leq\min(m^{2\alpha},m^{1/3})$ . Then the estimate for $\#H$ given by Theorem 1 has variance that is $O((\#H)^{2}+m^{k}/(dC^{2k-t}))$ .

**Proof: **The variance is given by

[TABLE]

As discussed earlier, ${\rm tr}({\cal{S}}){\rm tr}(\overline{{\cal{S}}})$ is a sum of terms of the form ${\rm tr}({\cal{Q}}(\vec{T_{1}})){\rm tr}(\overline{{\cal{Q}}(\vec{T_{2}})})$ , where $\vec{T_{1}}=\overrightarrow{v_{1}v_{2}},\dots,\overrightarrow{v_{2k-1}v_{2k}}$ and $\vec{T_{2}}=\overrightarrow{w_{1}w_{2}},\dots,\overrightarrow{w_{2k-1}w_{2k}}$ are each distinctly color-compatible. By Theorems 2 and 3, such a term contributes to $E(tr({\cal{S}}){\rm tr}(\overline{{\cal{S}}}))$ only if $(\vec{T_{1}},\vec{T_{2}})$ satisfies Condition 1, 2, or 3 at every vertex of $H$ .

First consider the $\vec{T}=(\vec{T_{1}},\vec{T_{2}})$ that satisfy Condition 1 at every vertex of $H$ . By Lemma 13, if $\vec{T}$ satisfies Condition 1 at every vertex of $H$ and is distinctly colorable, then $\vec{T_{1}}$ and $\vec{T_{2}}$ are each isomorphic to $\vec{H}$ . The number of such $\vec{T}$ is then $O((\#H)^{2})$ . By Theorems 2 and 3, each such $\vec{T}$ contributes $d^{2}$ to $E({\rm tr}({\cal{S}}){\rm tr}(\overline{{\cal{S}}}))$ and therefore contributes at most

[TABLE]

to the variance. This term is $O(1)$ , so the contributions of these $\vec{T}$ to the variance is $O((\#H)^{2})$ .

Next consider the $\vec{T}$ that satisfy Condition 1, 2, or 3 at every vertex of $H$ , but not always Condition 1. By Theorems 4 and 5, the number of such $\vec{T}$ that are distinctly color-compatible is $O(m^{k}/C^{2k-t})$ . By Theorems 2 and 3, each such $\vec{T}$ contributes at most $d$ to $E({\rm tr}({\cal{S}}){\rm tr}(\overline{{\cal{S}}}))$ . Thus these terms contribute $O(m^{k}/(dC^{2k-t}))$ to the variance.

4 Discussion of Algorithm

In this section, we discuss how our version of the algorithm compares to the original in terms of storage, update time per edge, and a one-time calculation.

First consider the case where ${\cal{G}}$ is the group of $r^{\rm th}$ roots of unity. As we showed in Theorem 6, the variance of a single instance of our algorithm is $O((\#H)^{2}+m^{k}/C^{2k-t})$ , so the number of instances needed to attain a variance of $O((\#H)^{2})$ is

[TABLE]

Each instance of our algorithm requires $O(C^{2})$ storage, so the storage needed is $O(C^{2}+m^{k}/((\#H)^{2}C^{2k-t-2}))$ . Assuming our goal is to minimize storage, if the first term in this expression is larger than the second, then we want to choose a smaller value of $C$ to ensure that

[TABLE]

i.e.,

[TABLE]

Thus, although we proved Theorem 6 for any $C\leq\min(m^{2\alpha},m^{1/3})$ , the best choice of $C$ is $\min(m^{2\alpha},m^{1/3},(m^{k}/(\#H)^{2})^{1/(2k-t)})$ . In that case, the number of instances of our algorithm that we need to perform is $O(m^{k}/((\#H)^{2}C^{2k-t}))$ , so the update time per edge is also $O(m^{k}/((\#H)^{2}C^{2k-t}))$ , and the storage is $O(m^{k}/((\#H)^{2}C^{2k-t-2}))$ . We thus save a factor of roughly $C^{2k-t-2}$ in storage and $C^{2k-t}$ in update time over the original algorithm.

Of the two terms in (15), 1 and $m^{k}/((\#H)^{2}C^{2k-t})$ , if the first is larger, then we are doing $O(1)$ instances of the algorithm, so the update time per edge is $O(1)$ . If the second is larger, then we can reduce the update time per edge by instead letting ${\cal{G}}$ be the group $\{\pm I,\pm M,\dots,\pm M^{d-1}\}$ and setting $d=m^{k}/((\#H)^{2}C^{2k-t})$ , but performing $1/d$ times as many instances of the algorithm. By Theorem 6, the variance remains $O((\#H)^{2})$ . The storage requirement also does not change, since we do $1/d$ times as many instances of the algorithm, but each instance requires $d$ times the storage. However, now we are performing $O(1)$ instances of the algorithm, so the update time is $O(1)$ .

There is one drawback of our version of the algorithm: when the stream ends, a potentially large calculation is required. In particular, we must compute

[TABLE]

This could potentially involve $C^{t}$ work, although for most $H$ , we can use inclusion-exclusion to perform the calculation more efficiently. For instance, if $H$ is a 4-cycle with vertices 1,2,3,4 and edges $\overrightarrow{12},\overrightarrow{23},\overrightarrow{34},\overrightarrow{41}$ , then we can loop through colors $c_{1}$ and $c_{3}$ for vertices 1 and 3. For each such pair of colors, we can loop through colors $c_{2}\notin\{c_{1},c_{3}\}$ for vertex 2, computing

[TABLE]

Separately, we can loop through colors $c_{4}\notin\{c_{1},c_{3}\}$ for vertex 4, computing

[TABLE]

We can multiply those two sums and then subtract the terms where $c_{2}=c_{4}$ :

[TABLE]

We thus do the computation with $C^{3}$ work rather than $C^{4}$ work. In fact, it is possible to do slightly better: each of the three sums above can be computed for all $c_{1}$ and $c_{3}$ by performing a $C\times C$ matrix multiplication, which can be done using less than $C^{3}$ work. It would be unusual for this computation to be a significant issue, but if it is, then we might want to choose a smaller value of $C$ , in which case we would not realize the full reduction in storage.

5 Conclusion

We have described three modifications to the [KMSS]-algorithm: we define one hash function ${\cal{X}}_{i}$ for each half-edge of $H$ rather than one for each vertex of $H$ ; we assign colors to the vertices of $G$ and restrict to distinctly color-compatible $\vec{T}$ ; and we allow matrix-valued hash functions as an alternative to complex-valued hash functions. The first two modifications reduce the variance in each instance of the algorithm, and therefore reduce the number of instances needed. This in turn reduces the required storage and update time per edge. The third modification reduces only the update time per edge.

Suppose that the maximum degree $\Delta$ of any vertex in $G$ is at most $m^{1/2-\alpha}$ , where $\alpha>0$ , and suppose $C\leq\min(m^{2\alpha},m^{1/3})$ . For the original [KMSS]-algorithm, both the storage and update time per edge are $O(m^{k}/(\#H)^{2})$ . For our algorithm, we have shown that the update time per edge is $O(1)$ , and the storage is $O(C^{2}+m^{k}/(C^{2k-t-2}(\#H)^{2}))$ , i.e., the storage has been reduced approximately by a factor of $C^{2k-t-2}$ .

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Ahmed, N. Duffield, J. Neville, and R. Kompella. Graph sample and hold: a framework for big-graph analytics. KDD 2014 , pp. 1446-1455, 2014.
2[2] N. Ahmed, N. Duffield, T. Wilke, R. Rossi, “On Sampling from Massive Graph Streams,” in Proc. VLDB, 1430-1441, 2017.
3[3] N. Ahmed and R. Rossi. The Network Data Repository with Interactive Graph Analytics and Visualization. http://networkrepository.com, 2015.
4[4] K. Ahn, S. Guha, and A. Mc Gregor. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the Symposium on Principles of Database Systems (PODS) , 2012, pp. 5-14.
5[5] U. Alon, D. Chklovskii, S. Itzkovitz, N. Kashtan, R. Milo, S. Shen-Orr. Network motifs: simple building blocks of complex networks. Science 298, no. 5594, pp. 824-827, 2002.
6[6] S. Assadi, M. Kapralov, S. Khanna. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In ITCS , volume 124 of LIP Ics , pp. 6:1-6:20. Schloss Dagstuhl - Liebniz-Zentrum fuer Informatik, 2019.
7[7] Z. Bar-Youssef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms with an application to counting triangles in graphs. SODA , pp. 623-632, 2002.
8[8] S. Bera and A. Chakrabarti. Towards tighter space bounds for counting triangles and other substructures in graph streams. In 34th Symposium on Theoretical Aspects of Computer Science (STACS 2017) , pp. 11:1-11:14, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Using Colors and Sketches to Count Subgraphs in a Streaming Graph

Abstract

1 Introduction

2 Description of Algorithm

2.1 The Functions Xi{\cal{X}}_{i}Xi​

Lemma 1**.**

2.2 The Functions Mi{\cal{M}}_{i}Mi​

Lemma 2**.**

2.3 Coloring Vertices

Lemma 3**.**

Theorem 1**.**

3 The Variance

Lemma 4**.**

3.1 Variance When G{\cal{G}}G Consists of Roots of Unity

Theorem 2**.**

3.2 Variance When G={±I,±M,…,±Md−1}{\cal{G}}=\{\pm I,\pm M,\dots,\pm M^{d-1}\}G={±I,±M,…,±Md−1}

Theorem 3**.**

3.3 Bounding the Variance

Lemma 5**.**

Lemma 6**.**

Lemma 7**.**

Lemma 8**.**

Lemma 9**.**

Lemma 10**.**

Theorem 4**.**

Lemma 11**.**

Lemma 12**.**

Theorem 5**.**

Lemma 13**.**

Theorem 6**.**

4 Discussion of Algorithm

5 Conclusion

2.1 The Functions ${\cal{X}}_{i}$

Lemma 1.

2.2 The Functions ${\cal{M}}_{i}$

Lemma 2.

Lemma 3.

Theorem 1.

Lemma 4.

3.1 Variance When ${\cal{G}}$ Consists of Roots of Unity

Theorem 2.

3.2 Variance When ${\cal{G}}=\{\pm I,\pm M,\dots,\pm M^{d-1}\}$

Theorem 3.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Theorem 4.

Lemma 11.

Lemma 12.

Theorem 5.

Lemma 13.

Theorem 6.