New $(\alpha,\beta)$ Spanners and Hopsets

Uri Ben-Levy; Merav Parter

arXiv:1907.11402·cs.DS·January 22, 2020

New $(\alpha,\beta)$ Spanners and Hopsets

Uri Ben-Levy, Merav Parter

PDF

Open Access

TL;DR

This paper introduces new constructions of $( ext{alpha}, ext{beta})$ spanners and hopsets with nearly optimal stretch and size, improving bounds and extending applicability for large distances in unweighted graphs.

Contribution

It presents novel $( ext{alpha}, ext{beta})$ spanner and hopset constructions with improved stretch, size, and hop-bound guarantees, advancing the state-of-the-art in graph sparsification.

Findings

01

Achieves nearly optimal stretch of $O(rac{k}{d})$ for various distance ranges.

02

Constructs $( ext{alpha}, ext{beta})$ spanners with $ ext{alpha}=O( ext{power of }k)$ and size $O(n^{1+1/k})$.

03

Develops $( ext{alpha}, ext{beta})$ hopsets with improved hop-bound $O(k^{ ext{log}(3+9/ ext{epsilon})})$.

Abstract

An $f (d)$ -spanner of an unweighted $n$ -vertex graph $G = (V, E)$ is a subgraph $H$ satisfying that $d i s t_{H} (u, v)$ is at most $f (d i s t_{G} (u, v))$ for every $u, v \in V$ . We present new spanner constructions that achieve a nearly optimal stretch of $O (⌈ k / d ⌉)$ for any distance value $d \in [1, k^{1 - o (1)}]$ , and $d \geq k^{1 + o (1)}$ . We show the following: 1. There exists an $f (d)$ -spanner $H \subseteq G$ with $f (d) \leq 7 k$ for any $d \in [1, k /2]$ with expected size $O_{k} (n^{1 + 1/ k})$ . This in particular gives $(α, β)$ spanners with $α = O (k)$ and $β = O (k)$ . 2. For any $ϵ \in (0, 1/2]$ , there exists an $(α, β)$ -spanner with $α = O (k^{ϵ})$ , $β = O_{ϵ} (k)$ and of expected size $O_{k} (n^{1 + 1/ k})$ . This implies a stretch of $O (⌈ k / d ⌉)$ for any $d \in [k /2, k^{1 - ϵ}]$ , and for every $d\geq…

Equations159

\mbox dist_{G} (u, v) \leq \mbox dist_{G^{'}}^{(β)} (u, v) \leq α \cdot \mbox dist_{G} (u, v),

\mbox dist_{G} (u, v) \leq \mbox dist_{G^{'}}^{(β)} (u, v) \leq α \cdot \mbox dist_{G} (u, v),

\mbox dist_{G} (u, p_{j} (u)) \leq \mbox dist_{G} (u, p_{j - 1} (v)) \leq \mbox dist_{G} (u, v) + \mbox dist_{G} (v, p_{j - 1} (v)) \leq j \cdot \mbox dist_{G} (u, v) .

\mbox dist_{G} (u, p_{j} (u)) \leq \mbox dist_{G} (u, p_{j - 1} (v)) \leq \mbox dist_{G} (u, v) + \mbox dist_{G} (v, p_{j - 1} (v)) \leq j \cdot \mbox dist_{G} (u, v) .

\mbox dist_{H} (u^{'}, v^{'})

\mbox dist_{H} (u^{'}, v^{'})

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

\mbox c - dist (v, SC_{i - 1, j - 1}) \leq r_{i - 1, j - 1} + α_{i - 1, j}

\mbox c - dist (v, SC_{i - 1, j - 1}) \leq r_{i - 1, j - 1} + α_{i - 1, j}

α_{i - 1, j} = ⌈ \frac{4 \cdot r _{i - 1, 0}}{⌈ k ^{ϵ} ⌉ - 3} + (1 + \frac{4}{⌈ k ^{ϵ} ⌉ - 3}) \cdot α_{i - 1, j - 1} ⌉ .

α_{i - 1, j} = ⌈ \frac{4 \cdot r _{i - 1, 0}}{⌈ k ^{ϵ} ⌉ - 3} + (1 + \frac{4}{⌈ k ^{ϵ} ⌉ - 3}) \cdot α_{i - 1, j - 1} ⌉ .

α_{i, j} \leq (1 + \frac{4 \cdot r _{i, 0}}{⌈ k ^{ϵ} ⌉ - 3}) \cdot p = 0 \sum j - 1 (1 + \frac{4}{⌈ k ^{ϵ} ⌉ - 3})^{p} .

α_{i, j} \leq (1 + \frac{4 \cdot r _{i, 0}}{⌈ k ^{ϵ} ⌉ - 3}) \cdot p = 0 \sum j - 1 (1 + \frac{4}{⌈ k ^{ϵ} ⌉ - 3})^{p} .

α_{i, j}

α_{i, j}

α_{i, j}

α_{i, j}

r_{i, j}

r_{i, j}

r_{i + 1, 0}

r_{i + 1, 0}

r_{T}

r_{T}

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

\leq

(⌈ k^{ϵ} ⌉ - 3) \cdot α_{i, j} \geq 4 j \cdot r_{i, 0} + 4 \cdot p = 1 \sum j - 1 α_{i, p} .

(⌈ k^{ϵ} ⌉ - 3) \cdot α_{i, j} \geq 4 j \cdot r_{i, 0} + 4 \cdot p = 1 \sum j - 1 α_{i, p} .

α_{i, j}

α_{i, j}

\mbox c - dist_{G} (C_{u}, C_{v})

\mbox c - dist_{G} (C_{u}, C_{v})

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

Δ_{l} = {min {1, \mbox dist (v_{l}, v)} min {α_{i, j}, \mbox dist (v_{l}, v)} v_{l} is 0 - u n c l u s t er e d v_{l} is (i, j) - u n c l u s t er e d

Δ_{l} = {min {1, \mbox dist (v_{l}, v)} min {α_{i, j}, \mbox dist (v_{l}, v)} v_{l} is 0 - u n c l u s t er e d v_{l} is (i, j) - u n c l u s t er e d

\mbox dist_{H} (v_{ℓ^{'}}, v) \leq 3 α_{i, j} + 2 \cdot r_{i, 0} + 2 r_{i, j - 1} \leq 2 r_{i, j} \leq 2 r_{T} .

\mbox dist_{H} (v_{ℓ^{'}}, v) \leq 3 α_{i, j} + 2 \cdot r_{i, 0} + 2 r_{i, j - 1} \leq 2 r_{i, j} \leq 2 r_{T} .

\mbox dist_{H} (u, v) \leq l = 0 \sum ℓ^{'} - 1 \mbox dist_{H} (v_{l}, v_{l + 1}) \leq (2 \cdot ⌈ k^{ϵ} ⌉ - 1) \cdot \mbox dist_{G} (u, v) + 1/15 \cdot 6 4^{\frac{1 - ϵ}{ϵ}} \cdot k^{1 - ϵ} .

\mbox dist_{H} (u, v) \leq l = 0 \sum ℓ^{'} - 1 \mbox dist_{H} (v_{l}, v_{l + 1}) \leq (2 \cdot ⌈ k^{ϵ} ⌉ - 1) \cdot \mbox dist_{G} (u, v) + 1/15 \cdot 6 4^{\frac{1 - ϵ}{ϵ}} \cdot k^{1 - ϵ} .

α_{i} = \frac{4}{ϵ} \cdot r_{i - 1} \mbox an d p_{i} = ∣ C_{i - 1} ∣/ n .

α_{i} = \frac{4}{ϵ} \cdot r_{i - 1} \mbox an d p_{i} = ∣ C_{i - 1} ∣/ n .

r_{i}

r_{i}

\mbox c - dist_{G} (C_{u}, C_{v})

\mbox c - dist_{G} (C_{u}, C_{v})

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v) \leq \mbox dist_{G} (u, v) + 2 \cdot (10 + 32/ ϵ) \cdot k^{l o g (5 + 16/ ϵ)} .

\mbox dist_{H} (u, v) \leq \mbox dist_{G} (u, v) + 2 \cdot (10 + 32/ ϵ) \cdot k^{l o g (5 + 16/ ϵ)} .

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

Δ_{j} = {min {1, \mbox dist_{G} (u_{j}, v)} min {α_{i}, \mbox dist_{G} (u_{j}, v)} u_{j} is 1-unclustered u_{j} is i \geq 2 unclustered

Δ_{j} = {min {1, \mbox dist_{G} (u_{j}, v)} min {α_{i}, \mbox dist_{G} (u_{j}, v)} u_{j} is 1-unclustered u_{j} is i \geq 2 unclustered

\mbox dist_{H} (u, v)

\mbox dist_{H} (u, v)

i = 1 \sum T ∣ E (H_{i}) ∣

i = 1 \sum T ∣ E (H_{i}) ∣

r_{0} = d \cdot \frac{⌈ k ^{ϵ} ⌉ + 1}{k ^{1 - ϵ}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Theory Research · Limits and Structures in Graph Theory · Graph Labeling and Dimension Problems

Full text

New $(\alpha,\beta)$ Spanners and Hopsets

Uri Ben-Levy The Weizmann Institute of Science, Israel. Email: [email protected].

Merav Parter The Weizmann Institute of Science, Israel. Email: [email protected]. Supported in part by an ISF grant (no. 2084/18).

An $f(d)$ -spanner of an unweighted $n$ -vertex graph $G=(V,E)$ is a subgraph $H$ satisfying that $\mbox{\rm dist}_{H}(u,v)$ is at most $f(\mbox{\rm dist}_{G}(u,v))$ for every $u,v\in V$ . A simple girth argument implies that any $f(d)$ -spanner with $O(n^{1+1/k})$ edges must satisfy that $f(d)/d=\Omega(k/d+1)$ . A matching upper bound (even up to constants) for super-constant values of $d$ is currently known only for $d=\Omega((\log k)^{\log k})$ as given by the well known $(1+\epsilon,\beta)$ spanners of Elkin and Peleg, and its recent improvements by [Elkin-Neiman, SODA’17], and [Abboud-Bodwin-Pettie, SODA’18].

We present new spanner constructions that achieve a nearly optimal stretch of $O(k/d+1\rceil)$ for any distance value $d\in[1,k^{1-o(1)}]$ and $d\geq k^{1+o(1)}$ . We also show more optimized spanner constructions with nearly linear number of edges. Specifically, for every $\epsilon\in(0,1)$ and integer $k\geq 1$ , we show the construction of $(3+\epsilon,\beta)$ spanners for $\beta=O_{\epsilon}(k^{\log(3+8/\epsilon)})$ and $\widetilde{O}_{\epsilon}(n^{1+1/k})$ edges.

In addition, we consider the related graph concept of hopsets introduced by [Cohen, J. ACM ’00]. Informally, an hopset $H$ is a weighted edge set that, when added to the graph $G$ , allows one to get a path from each node $u$ to a node $v$ with at most $\beta$ hops (i.e., edges) and length at most $\alpha\cdot\mbox{\rm dist}_{G}(u,v)$ . We present a new family of $(\alpha,\beta)$ hopsets with $\widetilde{O}(k\cdot n^{1+1/k})$ edges and $\alpha\cdot\beta=O(k)$ . Turning to nearly linear-size hopsets, we show a construction of $(3+\epsilon,\beta)$ hopset with $\widetilde{O}_{\epsilon}(n^{1+1/k})$ edges and hop-bound of $\beta=O_{\epsilon}(k^{\log(3+9/\epsilon)})$ , improving upon the state-of-the-art hop-bound of $\beta=O(\log k/\epsilon)^{\log k}$ .

1 Introduction
1.1 Our Contribution.
1.2 Technical Overview.
1.2.1 New Spanners
1.2.2 New Hopsets
1.3 Preliminaries
1.4 Algorithmic Tools
2 Improved Spanners for Close Vertex Pairs
3 New $(k^{\epsilon},O_{\epsilon}(k))$ Spanners
4 New $(3+\epsilon,\beta)$ Spanner
5 A New Family of $(k^{\epsilon},k^{1-\epsilon})$ Hopsets
5.1 $(k^{\epsilon},k^{1-\epsilon})$ Hopsets for $\epsilon\in[1/2,1)$ .
5.2 $(O(k^{\epsilon}),O_{\epsilon}(k^{1-\epsilon}))$ Hopsets for $0<\epsilon<1/2$ .
5.3 New $(3+\epsilon,\beta)$ Hopset
6 Efficient Computation of Spanners, Hopsets, and Applications
6.1 Efficient Constructions of $(3+\epsilon,\beta)$ Spanners and Applications
6.1.1 The Centralized Setting
6.1.2 The Distributed Setting
6.1.3 The Multi-Pass Streaming Setting
6.2 Efficient Constructions of $(\alpha,\beta)$ Hopsets
A Complete Proofs of Theorems 3 and 6
B Improved $(3+\epsilon,\beta)$ Spanners and Hopsets
B.1 Spanners
B.2 Hopsets

1 Introduction

Compressing the distance metric of an undirected input graph $G=(V,E)$ up to a small approximation, or stretch has been subject to an extensive research over the years. An $f(d)$ -spanner of a graph $G$ is a subgraph $H\subseteq G$ satisfying that $\mbox{\rm dist}_{H}(u,v)=f(\mbox{\rm dist}_{G}(u,v))$ . Letting $f(d)=k\cdot d$ for some fixed integer $k$ gives the standard multiplicative spanners [PU87, ADD*+*93a]. More generally, $f(d)=\alpha\cdot d+\beta$ corresponds to $(\alpha,\beta)$ spanners [PS89, BKMP05].

Althöfer et al. [ADD*+*93a] provided the first tight construction of $(2k-1)$ multiplicative spanners with $O(n^{1+1/k})$ edges. These spanners are believed to provide the optimal size-stretch tradeoff assuming the girth conjecture of Erdős [EM70]. It has been widely noted, however, that this optimality notion, has some caveats as the girth argument by itself provides a stretch lower bound only for adjacent vertex pairs. The first indication that one can provide improved stretch for distant vertex pairs was given by the notion of $(1+\epsilon,\beta)$ spanners of Elkin and Peleg [EP04]. In their seminal work, they showed that one can compute an $(1+\epsilon,\beta)$ spanner with $O_{\epsilon,k}(n^{1+1/k})$ edges and $\beta=O(\log k/\epsilon)^{\log k}$ , for every integer $k$ and $\epsilon\in(0,1)$ . This, in particular implies that one can compute $f(d)$ -spanners where $f(d)=O(1)$ for $d=\Omega(\log k)^{\log k}$ and with $O_{\epsilon,k}(n^{1+1/k})$ edges. Recently, Abboud, Bodwin and Pettie [ABP18a] showed that this tradeoff is nearly optimal at least for constant values of $k$ , ruling out the possibility for obtaining $(1+\epsilon)$ stretch value for considerably closer vertex pairs while keeping the same bound on the number of edges.

Another approach for obtaining improved stretch for non-adjacent pairs was suggested by the hybrid spanners of Parter [Par14]. For every integer $k$ , these spanners have $O_{k}(n^{1+1/k})$ edges and provide non-adjacent pairs a stretch of $k$ rather than $2k-1$ . This stretch value is optimal for vertex pairs at distance $2$ , assuming the girth conjecture, but does not provide a significant improvement for pairs at distance $d=\omega(1)$ . For instance, for pairs at distance $d=\sqrt{k}$ , current spanner constructions still provide a stretch of $k$ , rather than a stretch of $O(\sqrt{k})$ as might be attainable, or else be proven otherwise.

To summarize, the existing $f(d)$ -spanner constructions currently provide a nearly optimal stretch in two extreme regimes: short distances $d=O(1)$ and large distances $d\geq(\log k)^{\log k}$ . Our paper zooms into the missing intermediate regime of distances, aiming at providing the ultimate $O(k/d+1)$ stretch for any value of $d$ . Since our stretch values are optimal up to constants, these constructions are useful when the stretch parameter $k$ is super-constant, i.e., $k=g(n)$ for some function $g$ of the number of nodes $n$ (e.g., $k=O(\log\log n)$ ). Also note that a lower bound of $\Omega(k/d+1)$ is unconditional in the girth conjecture, and holds by a simple girth argument. While obtaining $(O(1),O(k))$ spanners will provide the holy grail stretch of $O(\lceil k/d\rceil)$ for the entire range of distances, our results are asymptotically very close to this goal. That is, our spanner constructions achieve the optimal stretch values (up to constants) for almost the entire range of distances. See Fig. 1 for a pictorial illustration for the current $f(d)$ -spanner constructions with $\widetilde{O}(n^{1+1/k})$ edges.

Hopsets.

Hopsets are fundamental graph structures introduced by Cohen [Coh00]. Since their introduction, they have been receiving considerably more attention recently [EN16a, ABP18a, HP19], due to their applications to shortest path computation in many computational settings e.g., parallel computing [KS97, Coh00, MPVX15, FL18, EN19b], dynamic graph algorithms [HKN18], streaming and distributed algorithms [Nan14, HKN16, EN16b, Elk17].

For an $n$ -vertex undirected weighted graph $G=(V,E,w)$ , a subset of weighted111The weight of each edge $(u,v)\in H$ is $\mbox{\rm dist}_{G}(u,v)$ . edges $H\subset{V\choose 2}$ (not in $G$ ) is called $(\alpha,\beta)$ hopset, if for any $u,v\in V$ , it holds that

[TABLE]

where $G^{\prime}=(V,E\cup H,w^{\prime})$ , and the weight function $w^{\prime}$ is defined as follows: for every $e\in E$ , $w^{\prime}(e)=w(e)$ and for every $e=(x,y)\in H$ , $w^{\prime}(e)=\mbox{\rm dist}_{G}(x,y)$ . The distance $\mbox{\rm dist}^{(\beta)}_{G^{\prime}}(u,v)$ is the length of the shortest path from $u$ to $v$ that uses at most $\beta$ edges in $G^{\prime}$ . The first $(1+\epsilon,\beta)$ hopset construction by Cohen [Coh00] had $\widetilde{O}(n^{1+1/k})$ edges and hop-bound of $\beta=O(\log n/\epsilon)^{\log k}$ . Elkin and Neiman [EN16a] presented an improved construction of $(1+\epsilon,\beta_{EN})$ hopsets with $\beta_{EN}=O\left(\log k/\epsilon\right)^{\log k}$ and $\widetilde{O}(n^{1+1/k})$ edges. The state of the art result is by Huang and Pettie [HP19] who proved that the emulators by Thorup and Zwick are in fact also $(1+\epsilon,\beta_{EN})$ hopsets with $O(n^{1+1/k})$ edges. A similar construction with slightly worse bounds has been independently shown by Elkin and Neiman [EN17].

Klein and Sairam [KS97] and Shi and Spencer [SS99] gave an efficient PRAM algorithm for computing exact hopset with hop-bound $\beta=O(\sqrt{n}\log n)$ and linear number of edges. Abboud, Bodwin and Pettie [ABP18a] showed that any hopset with less than $n^{1+1/k-\delta}$ edges for any $\delta>0$ must have $\beta=\Omega\left(1/(k\epsilon)\right)^{k}$ . This implies that the $(1+\epsilon,\beta)$ of Elkin and Neiman [EN16a] and Huang and Pettie [HP19] are nearly optimal for $k=O(1)$ .

At the other extreme with respect to parameters, Huang and Pettie [HP19] observed that the distance oracle of Thorup and Zwick [TZ05] immediately implies $(\alpha,\beta)$ hopsets with stretch $\alpha=2k-1$ , hop-bound $\beta=2$ and $O_{k}(n^{1+1/k})$ edges. In their paper, Huang and Pettie raised the following question concerning the existence of additional hopsets and specifically asked:

Are there other tradeoffs available when $\beta$ is a fixed constant (say 3 or 4), independent of $k$ ?

We answer this question in the affirmative, by presenting a new family of $(\alpha,\beta)$ hopsets. For any $\epsilon\in[1/2,1)$ and integer $k$ , we give the construction of $(9k^{\epsilon},5\cdot k^{1-\epsilon})$ hopsets with $O_{k}(n^{1+1/k})$ edges, in expectation. Thus taking $\epsilon=1-4/\log k$ gives constant hop-bound of $\beta=80$ , and an improved stretch of $\alpha=0.56k$ . We also show a construction of $(k^{\epsilon},k^{1-\epsilon})$ hopsets for the complementary range of $\epsilon\in(0,1/2)$ . It is important to note that whereas for $(\alpha,\beta)$ spanners with $O(n^{1+1/k})$ edges it must hold that $\alpha+\beta=\Omega(k)$ by a girth argument, this lower bound does not hold for hopsets. For any constant value of $\epsilon$ , our new family of $(\alpha,\beta)$ hopsets in fact satisfies that $\alpha\cdot\beta=O(k)$ .

Application to Shortest Path Computation.

The efficient computation of $(\alpha,\beta)$ spanners and hopsets lead to some immediate applications for fast shortest path computation, see e.g., [EZ06] and [EN16a]. Interestingly, the $\beta$ parameter of these structures effects not only the quality (or approximation) of the solution, but might also determine the time complexity. For example, in the streaming model, the number of passes for computing $(1+\epsilon,\beta)$ APSP approximation is linear in $\beta$ . This further motivates the study of $(\alpha,\beta)$ spanners with a considerably improved $\beta$ on the expense of having a constant approximation rather than $(1+\epsilon)$ approximation. For example, our new $(3+\epsilon,\beta)$ spanners with $\beta=\mathsf{poly}(k)$ leads to $3+\epsilon$ approximation for the APSP problem using only $O_{\epsilon}(\beta)$ passes. This should be compared against the current $(1+\epsilon)$ approximation, but with $O(\log k/\epsilon)^{\log k}$ passes.

Turning to the distributed computing models, the parameter $\beta$ also determines the locality of the $(\alpha,\beta)$ spanner computation. Specifically, an immediate outcome of our constructions is an algorithm that computes an $(O(1),k^{1+o(1)})$ spanner with $k^{1+o(1)}$ ${\mathsf{LOCAL}}$ rounds, almost matching the (tight) round complexity of the standard multiplicative $(2k-1)$ spanners (in the latter, all pairs suffer from a multiplicative stretch of $2k-1$ ).

1.1 Our Contribution.

In this paper we provide improved $f(d)$ -spanner constructions with nearly optimal stretch value, up to constants, for almost the entire range of distances. Our key result shows:

Theorem 1 (Almost Optimal $f(d)$ -Spanners).

For any integer $k\geq 1$ , and an unweighted $n$ -vertex graph $G=(V,E)$ , one can compute an $f(d)$ -spanner $H\subseteq G$ with $\widetilde{O}(n^{1+1/k})$ edges such that $f(d)/d=O(k/d+1)$ for any $d\in[1,k^{1-o(1)}]\cup[k^{1+o(1)},n]$ .

This $f(d)$ -spanner is almost optimal in the following sense. The stretch of $O(\lceil k/d\rceil)$ is the best possible up to a constant factor based on the girth argument, the size of the spanner is optimal up to logarithmic terms, and the bounded stretch is provided almost for the entire distance range, i.e., excluding $d\in[k^{1-o(1)},k^{1+o(1)}]$ . We note that for this “problematic range” of $[k^{1-o(1)},k^{1+o(1)}]$ our spanners still provide a considerably improved stretch over previous constructions. The spanner of Theorem 1 is obtained by two separate constructions. The first construction, which is also simpler, considers the range of distances $d\in[1,\sqrt{k}/2]$ . For this distance range we show an $f(d)$ -spanner with $f(d)=7k/d$ . Using the terminology of $(\alpha,\beta)$ spanner, we can say that our spanner is an $(\alpha,\beta)$ spanner with $\alpha=O(\sqrt{k})$ and $\beta=O(k)$ .

Theorem 2 (Spanners for Pairs at Dist. $O(\sqrt{k})$ ).

For any $n$ -vertex unweighted graph $G=(V,E)$ and integers $k\geq 1$ , and $d\in[1,\sqrt{k}/2]$ , there is a subgraph $H\subseteq G$ of expected size $|E(H)|=(\sqrt{k}\cdot n^{1+1/k})$ such that for every pair of vertices $u$ and $v$ at distance $d$ in $G$ , it holds that $\mbox{\rm dist}_{H}(u,v)\leq 7\cdot k$ . Hence, providing a stretch of $7k/d$ .

The algorithm of Theorem 2 already provides a stretch of $O(\sqrt{k})$ for all remaining distance values $d\geq\sqrt{k}/2$ . Obtaining an improved stretch of $o(\sqrt{k})$ for $d=\omega(\sqrt{k})$ is considerably more challenging, and requires additional ideas and techniques. This has led to the construction of new $(\alpha,\beta)$ spanners, that p the desired stretch of $O(\lceil k/d\rceil)$ for almost the entire range of distances.

New $(\alpha,\beta)$ Spanners.

Our key contribution is in providing a new $(\alpha,\beta)$ spanner that provides a constant stretch already for vertices at distance at least $k^{1+o(1)}$ . Up to the extra factor of $o(1)$ in the exponent, this is the best that one can hope for based on a girth argument. In addition, these spanners also settle down the desired stretch of $O(k/d)$ for any $d<k^{1-o(1)}$ .

Theorem 3.

For any $n$ -vertex unweighted graph $G=(V,E)$ , any $0<\epsilon<1/2$ , and $k\geq 16^{1/\epsilon}$ ,222The statement can work for any $k$ upon suffering from larger constants. there is a $(8\cdot k^{\epsilon},64^{1/\epsilon}\cdot k)$ -spanner of $G$ of expected size $O(64^{1/\epsilon}\cdot k\cdot n^{1+1/k}+64^{2/\epsilon}\cdot k^{2}\cdot n)$ .

By setting $\epsilon=\Theta(1/\log k)$ in the above, we get an $(\alpha,\beta)$ -spanner with $\alpha=O(1)$ and $\beta=k^{1+o(1)}$ , hence providing a constant stretch for every distance $d\geq k^{1+o(1)}$ . On the other hand, for every constant value of $\epsilon$ , we can get $\alpha=O(k^{\epsilon})$ and $\beta=O(k)$ , thus providing a stretch of $O(k/d)$ for $d=k^{1-\epsilon}$ . Prior to that construction, the known $(\alpha,\beta=O(k))$ spanners are given by the $(k,k-1)$ spanners of Baswana-Kavitha-Kurt-Pettie [BKMP05]. Note that the $(1+\epsilon,\beta)$ spanners of Elkin and Neiman [EN19a] also provide333This is implicit in their analysis. a stretch of $O(\log k)$ for vertices at distance $d\geq k^{\log 10}$ . Our construction provides a constant stretch for this distance range, while keeping almost the same bound on the number of edges444In this paper we did not optimize for secondary order factors in the size bound. All our solutions have $O(k^{2}\cdot\log n\cdot n^{1+1/k})=\widetilde{O}(n^{1+1/k})$ edges w.h.p..

While the spanners of Theorem 3 provide a constant stretch for $d\geq k^{1+o(1)}$ , this constant might be large. For that purpose, we also consider spanners that provide a small as possible stretch $\alpha$ , while keeping $\beta$ at most polynomial in $k$ . For instance, a simplification of the algorithm from Theorem 3 can also give $(4,\beta)$ -spanners with $\beta=k^{\log 11}$ .

Lemma 1 (New $(3+\epsilon,\beta)$ Spanners).

For any $n$ -vertex unweighted graph $G=(V,E)$ , integer $k$ and constant $\epsilon>0$ , one can compute a $(3+\epsilon,\beta)$ spanner $H$ of $G$ for $\beta=O((3+8/\epsilon)\cdot k^{\log(3+8/\epsilon)})$ and expected size $O(n^{1+1/k}+(3+8/\epsilon)\cdot k^{\log(3+8/\epsilon)}\cdot n)$ .

Pettie [Pet09] showed a construction of nearly linear-size spanner that provides a constant stretch of $17$ for vertices at distance $O(\log^{4}n)$ . The spanners of Lemma 1 provides a stretch of $4$ for this range of distances, and also work for any $k=O(\log n)$ .

New $(\alpha,\beta)$ Hopsets.

Hopsets, the cousins of $(\alpha,\beta)$ spanners and emulators, have received quite a lot of attention recently from the graph theoretical and the algorithmic prescriptive. Currently, hopsets constructions are known only for a narrow regime of $\alpha,\beta$ values. A particular setting that attracted a lot of attention is where $\alpha=(1+\epsilon)$ . Since all existing $(1+\epsilon,\beta)$ constructions provide a fairly large hop-bound, in this paper we resort to the constant stretch of $\alpha=O(1)$ . We show that this relaxation can significantly reduce the hop-bound. Specifically, we discover a new family of hopsets that is technically related to the spanner constructions described above.

Theorem 4 (New $(k^{\epsilon},k^{1-\epsilon})$ Hopsets).

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integer $k\geq 1$ and $\epsilon\in(0,1)$ such that $k^{\epsilon}\geq 16$ , one can compute an $(\alpha,\beta)$ hopset $H$ for $\alpha=O(k^{\epsilon})$ and $\beta=(c^{1/\epsilon}\cdot k^{1-\epsilon})$ for some constant $c>1$ . The number of edges is bounded by $|E(H)|=O((k^{\epsilon}\cdot n^{1+1/k}+k^{\epsilon}/\epsilon\cdot n)\log\Lambda)$ edges in expectation, where $\Lambda$ is the aspect ratio of the graph555The ratio between the largest and smallest distances between vertex pairs in $G$ ..

For example, by setting $\epsilon=\Theta(1/\log k)$ , we get an $(\alpha,\beta)$ hopset with constant stretch of $\alpha=O(1)$ , and an almost linear hop-bound $O(k^{1+o(1)})$ . This brings us very close to the ultimate holy-grail construction of $(O(1),k)$ hopsets.

We also consider the other direction of minimizing the stretch $\alpha$ as much as possible, while keeping the hop-bound $\beta$ to be polynomial in $k$ . As with spanners, this setting is considerably simpler (compared to that of Thm. 4), and we have the following:

Lemma 2 (New $(3+\epsilon,\beta)$ Hopsets).

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integer $k$ and $\epsilon>0$ , one can compute a $(3+\epsilon,\beta)$ hopset $H$ where $\beta=16\cdot(3+9/\epsilon)\cdot k^{\log(3+9/\epsilon)}$ with expected size $|E(H)|=O((n^{1+1/k}+\log{k}\cdot n)\cdot\log{\Lambda})$ .

An interesting reference point is for $k=O(\log n)$ , i.e., where the hopset has a nearly linear size of $\widetilde{O}(n)$ edges. In this setting, Lemma 2 gives for example stretch $\alpha=4$ and $\beta=O((\log n)^{3.6})$ . Lemma 2 gives a constant stretch but with hop-bound $\beta=(\log n)^{1+o(1)}$ . This should be compared with the $(1+\epsilon,\beta)$ hopsets of [HP19] and [EN16b] that provide a hop-bound of $O_{\epsilon}(\log\log n)^{\log\log n}$ . See Figure 2 for comparison with existing work.

Remark. We note that Gitlitz, Elkin and Neiman [GEN19] independently provided different constructions for $(3+\epsilon,\beta)$ spanners and hopsets with slightly larger $\beta$ vlues than those obtained in Theorems 1 and 2.

Applications to Shortest Paths.

We also show the efficient computation of our simplified $(O(1),\beta)$ spanners and hopsets in various computational settings. This has direct implications to APSP computation. Elkin and Neiman [EN19a, EN16a] specified the implementation details of their hopsets and spanners, along with some immediate applications to shortest paths computation in several computational settings. In the non-centralized settings (e.g., distributed, streaming, etc.), the value of $\beta$ effects not only the approximation quality of the solution, but rather also determines the locality of the problem. Our simplified $(3+\epsilon,\mathsf{poly}(k))$ spanners and hopsets is similar, implementation-wise666I.e., the steps that determine the computational cost are quite similar., to the $(1+\epsilon,O(\log k)^{\log k})$ spanners and hopsets of [EN19a] and [EN16a]. Therefore we can get these applications, almost for free, while enjoying an improved running time due to our improved $\beta$ , upon suffering from a slightly larger stretch of $(3+\epsilon)$ in the centralized regime and $(4+\epsilon)$ in the distributed and streaming regimes rather than $(1+\epsilon)$ . For example, we can have the following:

Lemma 3 (Approx. APSP).

For every $n$ -vertex unweighted graph and any parameters $\epsilon>0,\rho\in(0,1)$ , there exists a streaming algorithm that computes a $(4+\epsilon,\beta)$ approximation for the APSP in the multi-pass streaming model for $\beta=O((5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)})$ in either (1) $O(n^{1+\rho}\cdot\log{n})$ space with high probability and $O(\beta)$ passes, or (2) with $O(n^{1+1/k}+(\beta+\log{n})\cdot n)$ space in expectation and $O(n^{\rho}/\rho\cdot\log{n}\cdot\beta)$ passes with high probability.

This is the analogue of Cor. 21 in [EN19a] only that they have $(1+\epsilon)$ approximation with $\beta_{EN}=O(\log k)^{\log k}$ passes, rather than $(4+\epsilon)$ approximation with $O_{\epsilon}(k^{\log(5+16/\epsilon)})$ passes.

1.2 Technical Overview.

Throughout, we consider a fixed stretch parameter of $k$ , and restrict the number of edges in the output spanners and hopsets to $O_{k}(n^{1+1/k})$ edges in expectation.

1.2.1 New Spanners

The starting point for our algorithms is the observation that existing multiplicative spanner constructions (e.g., Baswana-Sen [BS07], Thorup-Zwick [TZ05]) provide a considerably improved multiplicative stretch, i.e., of $o(k)$ , for edges incident on sparse vertices. By sparse, we mean vertices whose $o(k)$ -ball contains a small number of vertices. To be more concrete, we start by describing some useful properties of the Baswana-Sen Algorithm.

Useful Properties of the Baswana-Sen Algorithm [BS07].

The Baswana-Sen algorithm consists of $k$ steps of clustering. A clustering $\mathcal{C}=\{C_{1},\ldots,C_{\ell}\}$ is a collection of vertex disjoint sets which we call clusters. Every cluster has some special vertex which we call the cluster center. The set of clustered vertices is $V(\mathcal{C})=\bigcup_{i=1}^{\ell}C_{i}$ . In the high level, the Baswana-Sen algorithm computes $k$ levels of clustering $\mathcal{C}_{0},\ldots,\mathcal{C}_{k-1}$ . In each clustering step $i$ , given the clustering $\mathcal{C}_{i-1}$ , the algorithm computes a clustering $\mathcal{C}_{i}$ , along with a subset of edges $H_{i}$ that “takes care” of the newly unclustered vetices (i.e., those that belong to a cluster in $\mathcal{C}_{i-1}$ , but do not belong to the clusters of $\mathcal{C}_{i}$ ). The clustering $\mathcal{C}_{i}$ and the output sugbraph $H_{i}$ have the following useful properties:

•

(1) $|\mathcal{C}_{i}|=O(n^{1-i/k})$ and $|E(H_{i})|=O(n^{1+1/k})$ in expectation.

•

(2) The radius of each cluster $C\in\mathcal{C}_{i}$ is at most $i$ . Specifically, for each cluster $C\in\mathcal{C}_{i}$ , the subgraph $H_{i}$ contains a tree $T_{C}\subseteq G[C]$ rooted at the cluster center of $C$ , spanning all the vertices in $C$ and has depth at most $i$ .

•

(3) For every unclustered vertex $u\notin V(\mathcal{C})$ , $\mbox{\rm dist}_{H_{i}}(u,v)\leq 2i-1$ for every $v\in N(u)$ .

Warming Up: $(\alpha,\beta)$ Spanners with $\alpha=O(\sqrt{k})$ and $\beta=k$ .

To illustrate the essence of our constructions, we start by showing an algorithm for computing an $(\alpha,\beta)$ spanner with $\alpha=O(\sqrt{k})$ and $\beta=k$ . As we will see, with this approach, a stretch of $\alpha=O(\sqrt{k})$ is the best that one can get for $\beta=k$ . Obtaining the ultimate stretch of $O(\lceil k/d\rceil)$ for every $d=\omega(\sqrt{k})$ will require additional ideas, and a considerably more delicate analysis.

The first phase of the algorithm applies a truncated variant of the Baswana-Sen algorithm in which only the first $\sqrt{k}$ clustering steps (out of the $k$ many steps) are applied. As a result, we get a cluster collection $\mathcal{C}=\mathcal{C}_{\sqrt{k}}$ containing $O(n^{1-1/\sqrt{k}})$ clusters (in expectation), and a subgraph $H^{\prime}=\bigcup_{i}^{\sqrt{k}}H_{i}$ that takes care of all edges incident to the non-clustered vertices (i.e., vertices not in $\mathcal{C}$ ) by Property (3).

In the second phase, the algorithm computes a cluster-graph $\widehat{G}$ whose nodes correspond to the clusters of $\mathcal{C}$ . Every two clusters $C,C^{\prime}\in\mathcal{C}$ are connected by an edge in $\widehat{G}$ iff $\mbox{\rm dist}_{G}(r(C),r(C^{\prime}))\leq 3\sqrt{k}$ where $r(C),r(C^{\prime})$ are the centers of the clusters $C,C^{\prime}$ respectively. The algorithm then computes a $(2\sqrt{k}-1)$ multiplicative spanner $\widehat{H}$ on this cluster graph. The edges of this spanner are translated to $G$ -edges as follows. For each edge $(C,C^{\prime})$ in the spanner $\widehat{H}$ , the shortest path in $G$ between $r(C)$ and $r(C^{\prime})$ is added to the spanner $H$ . By property (3), $\widehat{G}$ contains $N=O(n^{1-1/\sqrt{k}})$ nodes, and thus $\widehat{H}$ contains $O(N^{1+1/\sqrt{k}})=O(n)$ edges in expectation. Overall, this step adds $O(\sqrt{k}n)$ edges to the spanner.

The stretch argument: For the sake of this intuitive explanation, we will show that the spanner provides a stretch of $O(\sqrt{k})$ for vertex pairs at distance $\sqrt{k}$ . Fix such pair $u,v$ and let $P$ be their shortest path in $G$ . If $P$ has at most one clustered vertex (i.e., vertex appearing in the clusters of $\mathcal{C}$ ), then all the edges in $P$ are incident to non-clustered vertices, and thus by property (3), for each edge $(x,y)\in P$ , $\mbox{\rm dist}_{H}(x,y)\leq 2\sqrt{k}-1$ .

Otherwise, let $u^{\prime}$ and $v^{\prime}$ be the far-most clustered vertices on the path $P$ , where $u^{\prime}$ (respectively, $v^{\prime}$ ) is the closest clustered vertex to $u$ (respectively, $v$ ). By property (3) again, all edges on the path segments $P[u,u^{\prime}]$ and $P[v^{\prime},v]$ enjoy a stretch of at most $(2\sqrt{k}-1)$ in the spanner. It remains to consider the segment $P[u^{\prime},v^{\prime}]$ . Let $C_{u^{\prime}},C_{v^{\prime}}$ be the clusters of $u^{\prime},v^{\prime}$ respectively in $\mathcal{C}$ . Since the radius of these clusters is at most $\sqrt{k}$ (by property (2)), we have that $\mbox{\rm dist}_{G}(r(C_{u^{\prime}}),r(C_{v^{\prime}}))\leq 3\sqrt{k}$ and thus $C_{u^{\prime}}$ and $C_{v^{\prime}}$ are neighbors in $\widehat{G}$ . Since $\widehat{H}$ is a $(2\sqrt{k}-1)$ spanner of $\widehat{G}$ , we have $\mbox{\rm dist}_{\widehat{H}}(C_{u^{\prime}},C_{v^{\prime}})\leq 2\sqrt{k}-1$ . Finally, as each edge $(C,C^{\prime})$ in $\widehat{H}$ is translated into a path of length $\leq 3\sqrt{k}$ in $G$ , we have that $u^{\prime}$ and $v^{\prime}$ are connected in $H$ by a path of length $O(k)$ in $G$ , concluding that $\mbox{\rm dist}_{H}(u,v)=O(k)$ as desired. The complete algorithm appears in Sec. 2.

The challenge in obtaining a multiplicative stretch $\alpha=o(\sqrt{k})$ .

We note that this algorithm, as is, cannot be extended to provide an improved stretch of $o(\sqrt{k})$ for distances $d\geq\sqrt{k}$ for the following reason. Let $i$ be a parameter that determines the number of the Baswana-Sen steps applied in the first phase of the algorithm. Then, after applying $i$ steps of the Basawna-Sen algorithm, we end with $n^{1-i/k}$ clusters in $\mathcal{C}_{i}$ . By property (3), the stretch obtained for all unclustered vertices (i.e., not in $\mathcal{C}_{i}$ ) is bounded by $2i-1$ . In the second phase, a cluster graph $\widehat{G}$ with $|\mathcal{C}_{i}|$ nodes is defined. Each node of $\widehat{G}$ corresponds to a cluster in $\mathcal{C}_{i}$ , and two clusters $C,C^{\prime}\in\mathcal{C}_{i}$ are connected in $\widehat{G}$ , if their distance in $G$ is at most $3i$ . To keep the number of edges in the final spanner small777Any standard $(2k-1)$ -spanner on $N$ vertices contains $O(N^{1+1/k})$ edges. the algorithm can only afford the computation of $(2(k/i)-1)$ -multiplicative spanner $\widehat{H}\subseteq\widehat{G}$ . Each edge in this spanner $\widehat{H}$ is translated to a path of length $\leq 3i$ in the final spanner. Thus the algorithm adds at most $O(i\cdot n)$ edges in this phase. Overall, we get a stretch of $2i-1$ on all the edges incident to the non-clustered vertices (those that do not appear in $\mathcal{C}$ ), and a stretch of $\Theta(k/i)$ between every pair of clustered vertices at distance at most $i$ in $G$ . The optimal stretch is therefore achieved for $i=\Theta(\sqrt{k})$ .

In the next paragraph, we explain how to bypass this obstacle by adding a crucial intermediate phase to the algorithm.

A New Three Stage Approach for $(\alpha,\beta)$ Spanners.

We next explain the high level ideas to obtain an $(\alpha,\beta)$ spanner with a multiplicative stretch $\alpha=O(k^{\epsilon})$ and an additive stretch $\beta=c^{1/\epsilon}\cdot k$ for some constant $c>1$ and any $\epsilon\in(0,1/2)$ . By taking $\epsilon=\Theta(1/\log k)$ , it provides a constant multiplicative stretch for all pairs at distance $d\geq k^{1+o(1)}$ . By setting $\epsilon=o(1)$ , we get a multiplicative stretch of $O(k/d)$ for all pairs at distance $d<k^{1-o(1)}$ . In the following, we zoom into a fixed distance value $d\in[1,k^{1-\epsilon}]$ and describe the high-level construction of an $f(d)$ -spanner $H$ with $f(d)=O(k/d)$ . The same procedure will be repeated for every $d\in[1,k^{1-\epsilon}]$ (in fact, it will be sufficient to repeat it for every class of distances $[d,2d]$ ). The algorithm has three phases. The initial clustering stage applies a truncated Baswana-Sen algorithm, running only its first $\lceil k^{\epsilon}\rceil$ steps. This results in a clustering $\mathcal{C}_{0}$ containing $n^{1-1/k^{1-\epsilon}}$ clusters in expectation of radius $\lceil k^{\epsilon}\rceil$ , and a subset of edges $H_{0}$ . Property (3) of the Baswana-Sen algorithm guarantees a stretch of $O(k^{\epsilon})$ on all edges incident to the unclustered vertices (i.e., vertices not in $\mathcal{C}_{0}$ ). Note that at this point the number of clusters is too large to be able to compute a $k^{\epsilon}$ -spanner on the cluster graph, and terminate. The purpose of the next stage is to rapidly reduce the number of clusters to the ultimate number of $n^{1-1/k^{\epsilon}}$ clusters while keeping the radius of this clustering bounded by $O_{\epsilon}(k^{1-\epsilon})$ .

The intermediate superclustering stage is the most delicate part of the algorithm. It consists of $T=O(1/\epsilon)$ phases of superclustering. This step is similar in flavor to the $(1+\epsilon,\beta)$ spanner construction by Elkin-Neiman [EN19a], with several key differences that we explicitly state.

A supercluster $SC=\{C_{1},\ldots,C_{\ell}\}$ is a collection of vertex-disjoint clusters. Every supercluster $SC$ has a special vertex $r(SC)$ , that is denoted as the supercluster’s center. We denote be $V(SC)$ the set of vertices in the supercluster, that is $V(SC)=\bigcup_{C\in SC}V(C)$ . A superclustering $\mathcal{SC}=\{SC_{1},\ldots,SC_{\ell^{\prime}}\}$ is a collection of vertex-disjoint superclusters. The radius of a supercluster $SC_{\ell}$ is defined by $\max_{u\in V(SC_{\ell})}\mbox{\rm dist}_{G}(u,r(SC_{\ell}))$ . Each phase $i\in\{1,\ldots,T\}$ of the superclustering procedure starts with a clustering $\mathcal{C}_{i-1}$ with $N_{i-1}=O(n^{1-k^{\epsilon\cdot(i-1)}/k^{1-\epsilon}})$ clusters of radius $r_{i-1}=O((2k^{\epsilon})^{i})$ . The output of the phase is a clustering $\mathcal{C}_{i}$ with $N_{i}$ clusters and radius $r_{i}$ , as well as a collection of edges $H_{i}$ added to the spanner that takes care of the vertices that stopped being clustered at that phase (i.e., appearing in $\mathcal{C}_{i-1}$ but not in $\mathcal{C}_{i}$ ). We now describe the high level structure of this $i^{th}$ phase.

The phase has $t=O(k^{\epsilon})$ steps of superclustering. Starting with the $0^{th}$ superclustering $\mathcal{SC}_{i-1,0}=\{C~{}\mid~{}C\in\mathcal{C}_{i-1}\}$ , each step $j\in\{1,\ldots,t\}$ gets as input a superclustering $\mathcal{SC}_{i-1,j-1}$ , where the radius of each supercluster is bounded by $r_{i-1,j-1}=O(k^{\epsilon}\cdot e^{4j/k^{\epsilon}}\cdot r_{i-1,0})$ . Initially, $r_{i-1,0}$ is simply the radius of the clusters in $\mathcal{C}_{0}$ . It then outputs a new superclustering $\mathcal{SC}_{i-1,j}$ by applying the following sequence of operations:

•

Augmentation: We set an augmentation parameter $\alpha_{i-1,j}=O(r_{i-1,0}\cdot e^{4j/k^{\epsilon}})$ . Each vertex at distance at most $r_{i-1,j-1}+\alpha_{i-1,j}=O(e^{4j/k^{\epsilon}}\cdot k^{\epsilon}\cdot r_{i-1,0})$ from a center of some supercluster in the current superclustering, adds its shortest path to its closest center to the spanner $H$ (without being added to that supercluster).

•

Sub-sampling: Each supercluster $SC\in\mathcal{SC}_{i-1,j-1}$ gets sampled into $\mathcal{SC}^{\prime}$ with probability of $N_{i-1}/n$ where $N_{i-1}$ is the number of clusters in $\mathcal{C}_{i-1}$ .

•

New Superclustering: Each cluster (belonging to any supercluster $SC\in\mathcal{SC}_{i-1,j-1}$ ) at (center) distance at most $r_{i-1,0}+2\alpha_{i-1,j}+r_{i-1,j-1}$ from a center of a sampled supercluster, joins its closest sampled supercluster and adds the shortest path between their centers to the spanner. The center of the sampled supercluster maintains its role.

•

Handling Lost Clusters: Each other cluster $C$ (that is too far from the sampled superclusters), adds to the spanner $H$ a shortest path from its center to the center of every supercluster $SC\in\mathcal{SC}_{i-1,j-1}$ provided that their distance at most $r_{i-1,0}+2\alpha_{i-1,j}+r_{i-1,j-1}$ . This cluster will no-longer appear in the superclustering.

The augmentation parameters $\alpha_{i,j}$ , as well as the precise number of external phases $T$ , and internal superclustering steps $t$ are all set in a very delicate manner. We note that this additional augmentation step is not applied in the Elkin-Neiman’s algorithm [EN16a], and we find it to be quite useful in our stretch analysis. The output clustering $\mathcal{C}_{i}$ of the $i^{th}$ phase consists of a cluster $C_{\ell}=V(SC_{\ell})$ for every supercluster $SC_{\ell}\in\mathcal{SC}_{i-1,t}$ where $\mathcal{SC}_{i-1,t}$ is the last superclustering of that phase.

At the end of all these $T$ phases, we are left with a clustering $\mathcal{C}_{T}$ with $N_{T}=O(n^{1-1/k^{\epsilon}})$ clusters. The augmentation parameters $\alpha_{i,j}$ are set in a way that guarantees that the radius this clustering is bounded $r_{T}=O_{\epsilon}(k^{1-\epsilon})$ . Before starting the final phase, every unclustered vertex at distance at most $r_{T}+d$ from some cluster center, adds to the spanner its shortest path to its closest center.

The final clustering-graph phase computes a cluster graph $\widehat{G}$ where each cluster in $\mathcal{C}_{T}$ corresponds to a node in that graph. Two clusters $C,C^{\prime}\in\mathcal{C}_{T}$ are connected in $\widehat{G}$ if their center-distance is at most $2r_{T}+2d$ . The algorithm then computes a $(2k^{\epsilon}-1)$ spanner $\widehat{H}\subseteq\widehat{G}$ containing at most $O(N^{1+1/k^{\epsilon}})=O(n)$ edges. Finally, these spanner edges are translated into edges in $G$ , by adding to the final spanner $H$ the shortest path between the centers $r(C)$ and $r(C^{\prime})$ for each edge $(C,C^{\prime})\in\widehat{H}$ . It is easy to see that this step adds $O_{k}(n)$ edges to the spanner.

The analysis is based on providing distinct stretch guarantees depending on the precise step888The $(i,j)$ step is the $j$ ’th step of the $i$ ’th phase. $(i,j)$ in which the vertex stopped being clustered. Formally, a vertex is said to be $(i-1,j)$ -unclustered if it belongs to the superclusters of $\mathcal{SC}_{i-1,j-1}$ but does not belong to the superclusters of $\mathcal{SC}_{i-1,j}$ . The analysis shows that the later a vertex $u$ stopped being cluster the stronger is its stretch guarantee in the following sense. For every $(i-1,j)$ -unclustered vertex $u$ , the edges added to the spanner in phase $i$ guarantee that $\mbox{\rm dist}_{H}(u,v)\leq k^{\epsilon}\cdot\mbox{\rm dist}_{G}(u,v)$ for every vertex $v$ at distance of $\alpha_{i-1,j}$ from $u$ , where $\alpha_{i-1,j}$ grows with both $i$ and $j$ . The key point is that this stretch bound holds even if $v$ is $(i-1,j-1)$ unclustered. We complete the argument by considering any $u$ - $v$ shortest path $P$ (of any length), and dividing it into consecutive disjoint segments $P[x,y]$ of possibly varying lengths. The length of each segment depends on the step $(i,j)$ in which certain vertices along the path $P$ stopped being clustered. We then show that for each segment (except perhaps the last one), the spanner provides a multiplicative stretch of $k^{\epsilon}$ between the endpoints of this segment. A detailed algorithm description appears in Sec. 3.

1.2.2 New Hopsets

Our hopsets constructions bare similarities to the spanner algorithms, but include several modifications. First, our hopset is defined for weighted graphs whereas in the $(\alpha,\beta)$ spanners the graph is required to be unweighted. The key difference will be in the way that we handle the sparse vertices. In the spanners above, we applied a truncated variant of the Baswana-Sen algorithm. Here, we will use the classic $(2k-1,2)$ hopsets that followed by the distance oracle of Thorup and Zwick [TZ05]. To explain the ideas in their cleanest and most simplified form, in the below high-level description we restrict attention to unweighted graph and mainly explain the first non-trivial construction of $(\sqrt{k},\sqrt{k})$ hopsets. We start by over-viewing the construction of $(2k-1,2)$ hopsets, and highlight their key properties.

A Short Exposition of $(2k-1,2)$ Hopsets.

The construction of the distance oracle by Thorup and Zwick [TZ05] is based on an hierarchical collection of centers $A_{k-1}\subset...\subset A_{0}=V$ . Each $A_{i}$ is obtained by sampling each $v\in A_{i-1}$ independently with probability of $n^{-1/k}$ . The $i^{th}$ pivot of every vertex $v$ denoted by $p_{i}(v)$ is the closest vertex in $A_{i}$ to $v$ . For every $v$ , its $i^{th}$ bunch $B_{i}(v)$ contains all vertices in $A_{i}$ that are closer to $v$ than $p_{i+1}(v)$ . Thorup and Zwick showed that for every $v$ , $|B(v)|=O(k\cdot n^{1/k})$ in expectation. As observed by Huang and Pettie [HP19], the collection of bunches translates into a $(2k-1,2)$ hopset $H$ as follows: for every $v$ and $u\in B(v)$ , add to $H$ an edge $(u,v)$ of weight $\mbox{\rm dist}_{G}(u,v)$ . Since each bunch $B(v)$ is of size $O(kn^{1/k})$ , the output hopset has $O(kn^{1+1/k})$ edges. The key property that will be used by our algorithms is as follows. Fix a pair of vertices $u,v$ and define $i^{*}$ as the minimal index such that $p_{i^{*}}(v)$ is in the bunch of $u$ or vice-versa. By construction, $i^{*}\leq k-1$ . By the stretch argument of the distance oracle of Thorup and Zwick [TZ05], one can show that the hopset $H$ contains a two-hop path between $u$ and $v$ that goes through the common $(i^{*})$ th pivot, and the path has length at most $(2i^{*}+1)\cdot\mbox{\rm dist}(u,v)$ . We are next explain the construction of the $(\sqrt{k},\sqrt{k})$ hopset.

Warming Up: $(\alpha,\beta)$ Hopset with $\alpha,\beta=O(\sqrt{k})$ .

As common in hopset constructions, we fix a distance range $[d,2d]$ and describe a construction that provides the desired stretch and hop-bound for all pairs $u,v$ at distance $d^{\prime}\in[d,2d]$ . The same procedure is then applied for each of the $\log{\Lambda}$ distance ranges, where $\Lambda$ is the aspect ratio of the graph.

The hopset algorithm has two phases. First it computes the $(2k-1,2)$ hopsets based on the Thorup-Zwick distance oracle. In fact, for our purposes it will be sufficient to apply only the first $\sqrt{k}$ steps of the construction, i.e., computing the center sets $A_{0}=V,\ldots,A_{\sqrt{k}}$ , and adding hops to the hopset $H^{\prime}$ based on the bunches $B_{i}(v)$ for every $v$ and every $i\in\{1,\ldots,\sqrt{k}\}$ .

In the second phase, the algorithm computes a clustering $\mathcal{C}_{0}$ centered at the vertices of $A_{\sqrt{k}}$ as follows. Each vertex $v\in G$ at distance $2d$ from $A_{\sqrt{k}}$ joins the cluster of its closest center in $A_{\sqrt{k}}$ . This defines a cluster collection $\mathcal{C}_{0}$ of $|A_{\sqrt{k}}|=n^{1-1/\sqrt{k}}$ (in expectation) of vertex-disjoint clusters of radius at most $2d$ . In the hopset, we add a hop $(v,c_{v})$ between each clustered vertex $v$ to its cluster center $c_{v}\in A_{\sqrt{k}}$ .

Now, the algorithm computes the cluster-graph $\widehat{G}$ in which each cluster in $\mathcal{C}_{0}$ corresponds to a node, and two clusters $C,C^{\prime}\in\mathcal{C}_{0}$ are adjacent iff the distance between their centers is at most $6d$ . We note that the graph $\widehat{G}$ will be unweighted even when the graph $G$ is weighted, as each edge in $\widehat{G}$ corresponds to a path of length at most $6d$ in $G$ . Letting $\widehat{H}$ be a $(2\sqrt{k}-1)$ multiplicative spanner of $\widehat{G}$ , for every edge $(C,C^{\prime})\in\widehat{H}$ , the algorithm adds the hop $(r(C),r(C^{\prime}))$ to the hopset, where $r(C),r(C^{\prime})$ are the centers of $C,C^{\prime}$ respectively.

Why it works? First we bound the size of the hopset by $O_{k}(n^{1+1/k})$ . The first phase adds a TZ-hopset with $O(k\cdot n^{1+1/k})$ edges. In the second phase, we add one hop for every edge $(C,C^{\prime})$ in the spanner $\widehat{H}$ . Since this spanner has $O(|A_{\sqrt{k}}|^{1+1/\sqrt{k}})=O(n)$ edges, it adds $O(n)$ hops overall. In addition, each clustered vertex is connected with a hop to its cluster center, and as the clusters are vertex disjoint, it adds $O(n)$ hops.

Next, we consider the stretch and hop-bound argument for a fixed pair $u$ and $v$ at distance $d^{\prime}\in[d,2d]$ in $G$ . Let $P$ be their shortest path. A vertex $w$ is called clustered if it belongs to the clusters of $\mathcal{C}_{0}$ . By definition, a vertex $w$ is clustered iff $\mbox{\rm dist}_{G}(w,A_{\sqrt{k}})\leq 2d$ . First, consider the case where at most one vertex on $P$ is clustered. The argument goes by partitioning $P$ into $O(\sqrt{k})$ disjoint consecutive segments of length $2d/\sqrt{k}$ . For each such segment $P[x,y]$ we show that in the TZ-hopset $H^{\prime}$ there is a two-hop path between $x$ and $y$ of length at most $O(d)$ . To see this, consider an unclustered vertex $x$ and let $i$ be the minimum index satisfying that $p_{i}(x)\in B_{i}(y)$ . By the properties of the TZ-hopsets, $\mbox{\rm dist}(x,p_{i}(x))\leq i\cdot\mbox{\rm dist}(x,y)$ . Since $\mbox{\rm dist}_{G}(x,A_{\sqrt{k}})>2d$ , and $\mbox{\rm dist}(x,y)=2d/\sqrt{k}$ , we get that $i\leq\sqrt{k}$ . We therefore have $O(\sqrt{k})$ segments on $P$ , for each the TZ-hopset provides a two-hop path of length $O(d)$ . In total, we get a $u$ - $v$ path of $2\sqrt{k}$ hops, and of total length $O(\sqrt{k}\cdot d)$ as required. This establishes the stretch and hop-bound argument for a path containing at most one clustered vertex.

The other case follows by the second phase of the algorithm. We consider the far-most clustered vertex pair $u^{\prime}$ and $v^{\prime}$ on the $u$ - $v$ shortest path. The stretch and hop-bound argument for the subpaths $P[u,u^{\prime}]$ and $P[v^{\prime},v]$ follows by the argument above, and thus it remains to consider the path $P[u^{\prime},v^{\prime}]$ . Let $C_{u^{\prime}}$ and $C_{v^{\prime}}$ be the cluster of $u^{\prime}$ and $v^{\prime}$ respectively. Since $\mbox{\rm dist}_{G}(u^{\prime},v^{\prime})\leq 2d$ , we get that the clusters $C_{u^{\prime}}$ and $C_{v^{\prime}}$ are adjacent in $\widehat{G}$ , and therefore the hopset contains a path of at most $(2\sqrt{k}-1)$ hops between between the centers of $C_{u^{\prime}}$ and $C_{v^{\prime}}$ . As each such hop has weight of $O(d)$ , overall the hopset contains a path with $O(\sqrt{k})$ hops connecting $u^{\prime}$ and $v^{\prime}$ , of total length $O(\sqrt{k}\cdot d)$ as required. This completes the high-level idea of the construction, see Sec. 5.1 for more details.

Three Stage Approach for $(k^{\epsilon},k^{1-\epsilon})$ Hopsets.

The computation of the $(k^{\epsilon},k^{1-\epsilon})$ hopsets for every $\epsilon\in[1/2,1)$ is very similar to the high level description mentioned above. The complementary range of $\epsilon\in(0,1/2)$ is considerably more involved. It also has a three stage structure in a very similar manner to the $(k^{\epsilon},k)$ spanners. The first stage applies a truncated TZ hopset construction restricting to the first $k^{\epsilon}$ levels of clustering. Letting $A_{k^{\epsilon}}$ be the centers in the $k^{\epsilon}$ -level of the Thorup-Zwick algorithm, the second phase computes an initial clustering $\mathcal{C}_{0}$ with centers of $A_{k^{\epsilon}}$ . The vertices that do not belong to the clusters of $\mathcal{C}_{0}$ are called unclustered. For those vertices, the correctness will follow by the TZ-hopsets. The remaining clustered vertices are handled in two stages. A key stage of superclustering which rapidly reduces the number of clusters to $n^{1-1/k^{\epsilon}}$ , and a final stage in which the number of clusters is small enough, to allow the computation of an $k^{\epsilon}$ -spanner on that cluster graph. A more detailed description appears in Sec. 5.2.

Open Problems.

The most important open problem left by this work concerns the existence of $(\alpha,\beta)$ spanners with $\widetilde{O}(n^{1+1/k})$ edges for $\alpha=O(1)$ and $\beta=O(k)$ . This would provide a nearly optimal stretch, up to constants, for the entire range of distances. With our current constructions one can only get $(\alpha,O(k))$ spanners with $\alpha=2^{O(\sqrt{\log k})}$ . Alternatively, for a multiplicative stretch of $\alpha=O(1)$ , we currently get $\beta=k^{1+o(1)}$ . Note that the lower-bound constructions of Abboud, Bodwin and Pettie [ABP18a] are only tight for constant values of $k$ , and hence it might still be possible to even obtain $(1+\epsilon,O(k))$ spanners with $O_{k,\epsilon}(n^{1+1/k})$ edges. Another interesting open problem concerns the tightness of our hopsets constructions. We present a new family of $(\alpha=k^{\epsilon},\beta=k^{1-\epsilon})$ hopsets with $\widetilde{O}(n^{1+1/k})$ edges for any constant $\epsilon\in(0,1)$ . The most critical question is whether any $(\alpha,\beta)$ hopset with $\widetilde{O}(n^{1+1/k})$ edges must satisfy that $\alpha\cdot\beta=\Omega(k)$ .

1.3 Preliminaries

Graph Notations and Definitions.

We consider an undirected $n$ -vertex graph $G=(V,E)$ , where $V$ is the set of vertices and $E$ is the edge-set. let $N_{G}(u)$ be the neighbor of $u$ in $G$ . When $G$ is clear from the context, we may simply write $N(u)$ . Unless specified otherwise we assume $G$ to be unweighted. For $u,v\in V$ we denote by $\mbox{\rm dist}_{G}(u,v)$ the distance from $u$ to $v$ in $G$ . Similarly, for any subgraph $H\subseteq{G}$ we denote by $\mbox{\rm dist}_{H}(u,v)$ the distance between the vertices in $H$ . For any vertex $u\in V$ and integer $d$ , we denote by $\mathsf{\textbf{B}}_{G}(u,d)$ the set of vertices at distance at most $d$ from $u$ in $G$ , that is $\mathsf{\textbf{B}}_{G}(u,d)=\{v\in V~{}|~{}\mbox{\rm dist}_{G}(u,v)\leq d\}$ . When the context is clear we might simply write $\mathsf{\textbf{B}}(u,d)$ . By $\partial\mathsf{\textbf{B}}_{G}(u,d)$ we denote the set of vertices at distance exactly $d$ from $u$ , that is $\partial\mathsf{\textbf{B}}_{G}(u,d)=\{v\in V~{}|~{}\mbox{\rm dist}_{G}(u,v)=d\}$ . For a weighted graph $G=(V,E,w)$ , the aspect ratio denoted by $\Lambda$ is the ratio between the largest and smallest vertex-pair distances in $G$ . Unless stated otherwise, in our constructions, shortest path ties are broken in a consistent manner.

$(\alpha,\beta)$ Hopsets.

Hopsets are fundamental graph structures introduced by Cohen [Coh00]. Let $G=(V,E,w)$ be an undirected weighted graph and $H\subset{V\choose 2}$ be a set of edges called the hopset. In the graph $G^{\prime}=(V,E\cup H,w^{\prime})$ the weight function $w^{\prime}$ is defined by letting $w^{\prime}(e)=w(e)$ for every $e\in E$ and $w^{\prime}(e=(x,y)))=\mbox{\rm dist}_{G}(x,y)$ . Define the $\beta$ -limited distance in $G^{\prime}$ , denoted $\mbox{\rm dist}^{(\beta)}_{G^{\prime}}(u,v)$ , to be the length of the shortest path from $u$ to $v$ that uses at most $\beta$ edges in $G^{\prime}$ . We call $H$ a $(\beta,\epsilon)$ -hopset, where $\beta\geq 1,\epsilon>0$ , if, for any $u,v\in V$ , we have $\mbox{\rm dist}_{G^{\prime}}^{(\beta)}(u,v)\leq(1+\epsilon)\mbox{\rm dist}_{G}(u,v)$ .

Clusters and Superclusters.

A cluster $C\subseteq V$ is a subset vertices with a small weak diameter in $G$ . Every cluster $C$ has a special vertex $r(C)$ that is denoted as the cluster center. A clustering $\mathcal{C}=\{C_{1},\ldots,C_{\ell}\}$ is a collection of disjoint clusters, where the vertices of the clustering are denoted by $V(\mathcal{C})=\bigcup_{i}C_{i}$ . Note that $V(\mathcal{C})$ is not necessarily $V(G)$ .

In our algorithms we measure the distance between clusters $C,C^{\prime}$ by the distance between their centers $r(C),r(C^{\prime})$ respectively. Formally, define $\mbox{\rm c-dist}_{G}(C,C^{\prime})=\mbox{\rm dist}_{G}(r(C),r(C^{\prime}))$ . In the same manner, for a vertex $v$ and a cluster $C$ , define $\mbox{\rm c-dist}_{G}(v,C)=\mbox{\rm dist}_{G}(v,r(C))$ . For a collection of clusters $\mathcal{C}=\{C_{1},\ldots,C_{\ell}\}$ and a cluster $C^{\prime}$ , let $\mbox{\rm c-dist}_{G}(C^{\prime},\mathcal{C})=\min_{C_{j}\in\mathcal{C}}\mbox{\rm c-dist}_{G}(C^{\prime},C_{j})$ . In the same manner, for a vertex $v$ and a cluster collection $\mathcal{C}$ , let $\mbox{\rm c-dist}_{G}(v,\mathcal{C})=\min_{C_{j}\in\mathcal{C}}\mbox{\rm c-dist}_{G}(v,C_{j})$ . Throughout, we break shortest-path ties based on IDs. For instance, the closest cluster in $\mathcal{C}$ to a given vertex $v$ is the minimum-ID cluster $C^{\prime}$ in $\mathcal{C}$ that satisfies that $\mbox{\rm c-dist}(v,\mathcal{C})=\mbox{\rm dist}_{G}(v,r(C^{\prime}))$ . The radius of a cluster $C$ is defined by $rad(C):=\max_{u\in V(C)}\mbox{\rm dist}_{G}(u,r(C))$ .

A supercluster $SC_{i}=\{C_{1}\cdots C_{j}\}$ is a set of disjoint clusters, with one special vertex, namely, the center of superclusters, denoted by $r(SC_{i})$ . Let $V(SC_{i})=\bigcup_{C_{j}\in SC_{i}}V(C_{j})$ be the vertices of the supercluster. A superclustering $\mathcal{SC}=\{SC_{1},\ldots,SC_{\ell}\}$ is a collection of vertex disjoint superclusters. Let $V(\mathcal{SC})=\bigcup_{SC_{j}\in\mathcal{SC}}V(SC_{j})$ denote the vertices of the superclustering $\mathcal{SC}$ . Similarly to clusters, for a pair of superclusters $SC_{i},SC_{j}$ define $\mbox{\rm c-dist}_{G}(SC_{i},SC_{j})=\mbox{\rm dist}_{G}(r(SC_{i}),r(SC_{j}))$ . For a cluster $C$ and a supercluster $SC$ , define $\mbox{\rm c-dist}(C,SC)=\mbox{\rm dist}_{G}(r(C),r(SC))$ . For a cluster $C$ and a superclustering $\mathcal{SC}$ , define $\mbox{\rm c-dist}(C,\mathcal{SC})=\min_{SC^{\prime}\in\mathcal{SC}}\mbox{\rm c-dist}(C,SC^{\prime})$ . Similarly, for a vertex $v\in V$ and a supercluster $SC$ let $\mbox{\rm c-dist}(v,SC)=\mbox{\rm dist}_{G}(v,r(SC))$ , and for a superclustering $\mathcal{SC}$ , define $\mbox{\rm c-dist}(v,\mathcal{SC})=\min_{SC^{\prime}\in\mathcal{SC}}\mbox{\rm c-dist}(v,SC^{\prime})$ . The radius of a supercluster $SC_{i}$ is defined by $rad(SC_{i}):=\max_{u\in V(SC_{i})}\mbox{\rm dist}_{G}(u,r(SC_{i}))$ .

1.4 Algorithmic Tools

Multiplicative Spanners of Baswana and Sen [BS07].

Our algorithms are based on a truncated variant of the Baswana-Sen algorithm, namely Procedure $\mathsf{TruncatedBS}$ . The procedure gets as an input a graph $G$ , a stretch parameter $k$ and integer $t$ that determines the number of clustering steps. The algorithm begins with singleton clusters $\mathcal{C}_{0}=\{\{v\}~{}|~{}v\in V\}$ . Then, at every step $i\in\{1,\ldots,t\}$ , a clustering $\mathcal{C}_{i}$ is defined based on the given clustering $\mathcal{C}_{i-1}$ . Every cluster in $\mathcal{C}_{i-1}$ is sampled with probability $n^{-1/k}$ to $\mathcal{C}_{i}$ . Vertices that are not adjacent999We say that a vertex $v$ is adjacent to a cluster $C$ if $C$ contains at least one neighbor of $v$ . to sampled clusters are called unclustered, and they do not appear in the following clusters. For each unclustered vertex the procedure adds to the spanner $H$ one edge to each of its adjacent clusters in $\mathcal{C}_{i-1}$ . Any other vertex joins its closest sampled cluster. Finally, the algorithm also adds to the output subgraph $H$ the spanning trees of each cluster $C\in\mathcal{C}_{i}$ (rooted at the cluster center) for every $i\in\{1,\ldots,t\}$ .

Fact 1.

*[Theorems 4.1,4,2 and Lemma 4.1 in [BS07]]

(1) For every $1\leq i\leq t$ , $\mathbb{E}[|\mathcal{C}_{i}|]=n^{1-\frac{i}{k}}$ , (2) the radius of each cluster $C\in\mathcal{C}_{i}$ is at most $i$ , (3) the total number of edges, in expectation, added to $H$ is $O(t\cdot n^{1+1/k})$ , and (4) for every edge $(u,v)\in G$ satisfying that at least one of the endpoints is not clustered in $\mathcal{C}_{i}$ , it holds that $\mbox{\rm dist}_{H}(u,v)\leq 2i-1$ .*

Distance Oracles and Hopsets of Thorup and Zwick [TZ05].

Our hopsets constructions are based on the $(2k-1,2)$ hopsets that are based on the distance oracles of Thorup and Zwick [TZ05]. The construction of the hopset is based on defining an hierarchical collection of centers $A_{k-1}\subset...\subset A_{0}=V$ , where each $A_{i}$ is obtained by sampling each $v\in A_{i-1}$ independently with probability of $n^{-1/k}$ and let $A_{k}=\phi$ . The $i^{th}$ pivot of every vertex $v$ denoted by $p_{i}(v)$ is the closest vertex in $A_{i}$ to $v$ . For every $v$ , its $i^{th}$ bunch $B_{i}(v)$ contains all vertices in $A_{i}$ that are strictly closer to $v$ than $p_{i+1}(v)$ . That is, $B_{i}(v)=\{u\in A_{i}-A_{i+1}~{}\mid~{}\mbox{\rm dist}_{G}(v,u)<\mbox{\rm dist}(v,p_{i+1}(v))\}$ , and let $B(v)=\bigcup_{i=1}^{k-1}B_{i}(v)$ . Note that we add a hop from each vertex to all vertices in $A_{k-1}$ . Thorup and Zwick showed that for every $v$ , $|B(v)|=O(k\cdot n^{1/k})$ in expectation. The collection of bunches translates into $(2k-1,2)$ hopset $H$ as follows: for every $v$ and $u\in B(v)$ , add to $H$ an edge $(u,v)$ of weight $\mbox{\rm dist}_{G}(u,v)$ . In our applications, we sometimes use a truncated version of the Thorup and Zwick construction for a given input parameter $t\leq k$ . Algorithm $\mathsf{TZ}(G,k,t)$ gets as input a graph $G$ , stretch parameter $k$ and an integer $t$ . The output of the algorithm is the TZ-hopset $H_{TZ}$ along with the subset $A_{t}$ , that is, level- $t$ centers that contains $n^{1-t/k}$ vertices in expectation.

Fact 2.

Fix a vertex pair $u$ and $v$ . Let $i_{u}$ the minimal index $i$ such that $p_{i}(u)\in B_{i}(v)$ , and similarly let $i_{v}$ be the minimal index $i^{\prime}$ such that $p_{i^{\prime}}(v)\in B_{i^{\prime}}(u)$ . Let $i^{*}=\min\{i_{u},i_{v}\}$ . (i) For every $j\leq i^{*}$ , it holds that $\mbox{\rm dist}_{G}(u,p_{j}(u))\leq j\cdot\mbox{\rm dist}_{G}(u,v)$ , and (ii) the hopset $H_{TZ}$ satisfies that $\mbox{\rm dist}_{G}(u,v)\leq\mbox{\rm dist}^{(2)}_{H_{TZ}\cup G}(u,v)\leq(2i^{*}+1)\mbox{\rm dist}_{G}(u,v)$ . Furthermore (iii), it holds that $|B_{i}(v)|\leq n^{1/k}$ in expectation.

Proof.

(i) By induction on $j$ . For $j=0$ this is clear since $p_{j}(u)=u$ . Assume the claim is true for $j-1$ , thus $\mbox{\rm dist}_{G}(v,p_{j-1}(v))\leq(j-1)\cdot\mbox{\rm dist}_{G}(u,v)$ and $\mbox{\rm dist}_{G}(u,p_{j-1}(u))\leq(j-1)\cdot\mbox{\rm dist}_{G}(u,v)$ . For $j\leq i^{*}$ , since $p_{j-1}(v)\notin B(u)$ , it follows that

[TABLE]

(ii) W.l.o.g., let $i^{*}=i_{u}$ . Then, $\mbox{\rm dist}_{H_{TZ}}(u,v)\leq\mbox{\rm dist}_{G}(u,p_{i^{*}}(u))+\mbox{\rm dist}_{G}(p_{i^{*}}(u),v)$ . Then by (i) it follows that $\mbox{\rm dist}_{H_{TZ}}(u,v)\leq i^{*}\cdot\mbox{\rm dist}_{G}(u,v)+(i^{*}+1)\mbox{\rm dist}_{G}(u,v)\leq(2\cdot i^{*}+1)\cdot\mbox{\rm dist}_{G}(u,v)$ . ∎

Throughout in all our hopset constructions, whenever an edge $(u,v)\in V\times V$ is added to the hopset it is given a weight of $\mbox{\rm dist}_{G}(u,v)$ . To avoid confusion, the edges not in $G$ are referred to as hops.

Roadmap.

In Sec. 2, we present the first spanner construction that provides a stretch of $7k/d$ for all vertices at distance at most $\sqrt{k}$ . Then in Sec. 3, we present the key construction of $O(k^{\epsilon},O_{\epsilon}(k))$ spanners. Sec. 4 presents a simplified construction for $(3+\epsilon,\beta)$ spanners. Sec. 5 describes the new hopset constructions. Finally, in Appendix 6, we describe the implementation details and applications, and in Appendix B, we show improved constructions of $(3+\epsilon,\beta)$ spanners and hopsets.

2 Improved Spanners for Close Vertex Pairs

This section is devoted to showing Theorem 2. We consider unweighted graphs, and describe the construction of a spanner that provides a stretch of $7k/d$ for every pair of vertices $u$ and $v$ at distance at most $d$ in $G$ , provided that $d\leq\sqrt{k}/2$ . In the language of $(\alpha,\beta)$ -spanner, this spanner can be viewed as an $(O(\sqrt{k}),k)$ spanner. For simplicity, we fix a distance value $d\in[1,\sqrt{k}/2]$ and explain how to provide a stretch of $7k/d$ using a subgraph of expected size $O(k/d\cdot n^{1+1/k})$ . The same procedure will be repeated for every $d\leq\sqrt{k}/2$ .

Description of Algorithm $\mathsf{SpannerShortDist}$ .

The algorithm consists of two key steps. The first step applies a truncated version of Baswana-Sen algorithm, applying only the first $\left\lfloor k/d\right\rfloor$ clustering steps. This results in a clustering $\mathcal{C}$ of expected size $n^{1-\frac{\left\lfloor k/d\right\rfloor}{k}}=O(n^{1+1/k-1/d})$ , as well as a subset of edges added to the spanner $H$ (i.e., that takes care of the vertices that are not in the clusters of $\mathcal{C}$ ). In the second step, the algorithm computes a cluster-graph $\widehat{G}(\mathcal{C},\mathcal{E})$ as follows. The vertices of the cluster-graphs, denoted as super-nodes, are the clusters of $\mathcal{C}$ . Every two clusters $C_{i},C_{j}\in\mathcal{C}$ are connected in $\widehat{G}$ iff the distance between their centers in $G$ is at most $d+2k/d$ . That is, $\mathcal{E}=\{(C_{i},C_{j})~{}\mid~{}\mbox{\rm c-dist}_{G}(C_{i},C_{j})\leq d+2\cdot k/d\}$ . The algorithm then computes a $(2d-1)$ spanner $\widehat{H}$ on this cluster graph, by using any standard multiplicative spanner procedure (e.g., the greedy spanner). Finally, the edges of this spanner are translated into $G$ -edges as follows: for every $(C_{i},C_{j})\in\widehat{H}$ , add to $H$ the shortest path in $G$ between the centers of $C_{i}$ and $C_{j}$ . This completes the description of the algorithm. See Alg. 1 for a pseudocode.

We next analyze Algorithm $\mathsf{SpannerShortDist}$ and prove Thm. 2.

Proof.

Stretch: Fix a pair of vertices $u$ and $v$ at distance $d$ in $G$ for $d\in[1,\sqrt{k}/2]$ , and let $P$ be their shortest path in $G$ . We consider two cases. First, assume that no edge $e=(u^{\prime},v^{\prime})$ on $P$ has both its endpoints in the clusters of $\mathcal{C}$ . Then, by Fact 1(4), we have that $\mbox{\rm dist}_{H}(u^{\prime},v^{\prime})\leq 2(k/d)-1$ for every edge $(u^{\prime},v^{\prime})$ in $P$ . This gives a $u$ - $v$ path in $H$ of total length $d\cdot(2(k/d)-1)=2k-d$ . Next consider the complementary case where $P$ contains at least one edge with its both endpoints clustered. Let $u^{\prime},v^{\prime}$ be the leftmost and rightmost clustered vertices on $P$ . Let us define the following subpaths $P_{1}:=P[u,u^{\prime}]$ , $P_{2}:=[u^{\prime},v^{\prime}]$ and $P_{3}:=[v^{\prime},v]$ , such that $P=P_{1}\circ P_{2}\circ P_{3}$ . For every vertex $z$ , let $C_{z}$ be its closest cluster in $\mathcal{C}$ with respect to the distance to the centers. Since $\mbox{\rm c-dist}_{G}(C_{u^{\prime}},C_{v^{\prime}})\leq 2k/d+\mbox{\rm dist}_{G}(u,v)=2k/d+d$ , it holds that $(C_{u^{\prime}},C_{v^{\prime}})\in E(\widehat{G})$ , thus by the properties of the $(2d-1)$ -spanner $\widehat{H}$ , it holds that $\mbox{\rm dist}_{\widehat{H}}(C_{u^{\prime}},C_{v^{\prime}})\leq 2d-1$ . As each edge in $\widehat{H}$ corresponds to a shortest path of length at most $2k/d+d$ in $H$ , we have:

[TABLE]

Finally, again by Fact 1(4), we have that $d_{H}(u,u^{\prime})\leq(2(k/d)-1)\cdot d_{G}(u,u^{\prime})$ and $d_{H}(v^{\prime},v)\leq(2(k/d)-1)\cdot d_{G}(v^{\prime},v)$ . We therefore conclude that

[TABLE]

Size: We show that for each fixed distance value $d$ , the algorithm adds at most $O(k/d\cdot n^{1+1/k})$ edges to the spanner. By Claim 1(3), the first $k/d$ steps of the Baswana-Sen clustering adds $O(k/d\cdot n^{1+1/k})$ edges in expectation. By Claim 1(1), in expectation, $\mathcal{C}$ contains $N=n^{1+1/k-1/d}$ clusters. Thus, the cluster-graph $\widehat{G}$ has $N$ super-nodes, and the size of its $(2d-1)$ -spanner $\widehat{H}$ is $N^{1+1/d}=O(n^{1+1/(k\cdot d)})$ in expectation. Since each edge in $\widehat{H}$ corresponds to a path of length at most $(2k/d+d)$ in $G$ , we get that this step adds $d\cdot|E(\widehat{H})|=O(k/d\cdot n^{1+1/(k\cdot d)})$ edges to the spanner. The size argument follows. ∎

3 New $(k^{\epsilon},O_{\epsilon}(k))$ Spanners

In this section, we consider Theorem 3 and show the construction of $(\alpha,\beta)$ spanners that provide a nearly optimal stretch for all pairs at distance $\sqrt{k}/2<d\leq k^{1-o(1)}$ and $d\geq k^{1+o(1)}$ . For the sake of simplicity, we consider a fixed distance value $d$ , and prove the following lemma:

Lemma 4.

For any $n$ -vertex unweighted graph $G=(V,E)$ , any $0<\epsilon<\frac{1}{2}$ and integers $k>16^{1/\epsilon},d\geq 1$ , there is a subgraph $H\subseteq{G}$ of expected size $O(k^{\epsilon}\cdot n^{1+1/k}+(64^{1/\epsilon}k^{1+\epsilon}+d)\cdot n)$ such that for every $u,v\in V$ , at distance $d$ in $G$ , it holds that $\mbox{\rm dist}_{H}(u,v)\leq 4\cdot k^{\epsilon}\cdot d+1/6\cdot 64^{1/\epsilon}\cdot k$ .

We later on show how Theorem 3 follows by applying the construction of Lemma 4 algorithm for any $d\in[1,k^{1-\epsilon}]$ . We now turn to describe our three stage procedure for computing the spanners of Lemma 4.

Algorithm $\mathsf{SpannerLongDist}$ .

The algorithm works in three stages. In the first stage it calls Procedure $\mathsf{TruncatedBS}$ for $\left\lceil k^{\epsilon}\right\rceil$ clustering steps. This results with a clustering $\mathcal{C}_{0}$ of $O(n^{1-\frac{\lceil k^{\epsilon}\rceil}{k}})$ clusters in expectation, each with radius at most $\left\lceil k^{\epsilon}\right\rceil$ .

In the second stage, the number of clusters is dramatically reduced to $O(n^{1-\frac{1}{k^{\epsilon}}})$ while keeping the radius of each cluster to $O_{\epsilon}(k^{1-\epsilon})$ . In the last stage, a cluster graph is computed on the collection of $O(n^{1-\frac{1}{\lceil k^{\epsilon}\rceil}})$ clusters, and a $(2\cdot\lceil k^{\epsilon}\rceil-3)$ spanner $\widehat{H}$ is computed on that graph. Each edge in $\widehat{H}$ will be translated into a $G$ -path that is added to the final spanner.

Preliminary Stage: Truncated Baswana-Sen Algorithm.

The algorithm starts by applying the first $\left\lceil k^{\epsilon}\right\rceil$ steps of Alg. $\mathsf{TruncatedBS}$ . This results in a clustering $\mathcal{C}_{0}$ and a subgraph $H_{0}$ . By the properties of the Baswana-Sen algorithm, $H_{0}$ has $O(k^{\epsilon}\cdot n^{1+1/k})$ edges in expectation and $\mathcal{C}_{0}$ consists of $n^{1-\frac{\left\lceil k^{\epsilon}\right\rceil}{k}}$ clusters in expectation with radius at most $\left\lceil k^{\epsilon}\right\rceil$ . By Fact 1(4), we have:

Claim 1.

For every unclustered vertex $u\in V$ and every vertex $v\in N_{G}(u)$ , $\mbox{\rm dist}_{H_{0}}(u,v)\leq 2\lceil k^{\epsilon}\rceil-1$ .

Middle Stage: Superclustering.

For clarity of presentation, throughout we assume that $\lceil k^{\epsilon}\rceil$ divides $4$ , up to factor $4$ in the final stretch, this assumption can be made without loss of generality. The middle step consists of $T:=\log_{\lceil k^{\epsilon}\rceil/4}(k^{1-2\epsilon})$ applications of Procedure $\mathsf{SuperClusterAugment}$ . We refer to each application of this procedure by a phase. For clarity of presentation, we also assume $T$ to be an integer, and in Sec. A we describe how to remove this assumption. In each phase $i$ , the input to Procedure $\mathsf{SuperClusterAugment}$ is a clustering $\mathcal{C}_{i-1}=\{C_{1},\ldots,C_{\ell}\}$ of radius $r_{i-1}=O((2k^{\epsilon})^{i})$ . The output of the phase is a clustering $\mathcal{C}_{i}$ of radius $r_{i}$ , and a subgraph $H_{i}$ that takes care of all vertices that became unclustered in that phase. This output clustering is obtained by applying $t:=\lceil k^{\epsilon}\rceil/4$ steps of supercluster growing. As we will see, the superclustering procedure will be very similar to Proc. $\mathsf{TruncatedBS}$ only that we will now treat each cluster $C\in\mathcal{C}_{i-1}$ as a node.

Starting with the trivial superclustering $\mathcal{SC}_{i-1,0}=\{\{C_{j}\}~{}\mid~{}C_{j}\in\mathcal{C}_{i-1}\}$ whose radius is bounded by $r_{i-1,0}=r_{i-1}$ , in the $j^{th}$ step of phase $i$ for $j\geq 1$ , the algorithm is given a superclustering $\mathcal{SC}_{i-1,j-1}$ . The radius of these superclusters will be bounded by $r_{i-1,j-1}$ . The algorithm defines a superclustering $\mathcal{SC}_{i-1,j}$ along with a subgraph $H_{i,j}$ as follows.

Each unclustered vertex $v\in V\setminus{V(\mathcal{SC}_{i-1,j-1})}$ satisfying that

[TABLE]

adds to $H_{i,j}$ the shortest paths to the closest center in $\mathcal{SC}_{i-1,j-1}$ , where

[TABLE] 2. 2.

Let $\mathcal{SC}^{\prime}\subseteq\mathcal{SC}_{i-1,j-1}$ be the collection of superclusters obtained by sampling each supercluster $SC_{\ell}\in\mathcal{SC}_{i-1,j-1}$ independently with probability of $\frac{n_{0}}{n}$ , where $n_{0}:=|\mathcal{C}_{i-1}|$ . 3. 3.

Set $\delta_{i-1,j}=r_{i-1,0}+r_{i-1,j-1}+2\cdot\alpha_{i-1,j}$ . Each cluster $C\in SC_{\ell}$ at center-distance at most $\delta_{i-1,j}$ from $\mathcal{SC}^{\prime}$ joins the supercluster of its closest center in $\mathcal{SC}^{\prime}$ , by adding the shortest path between the centers to $H_{i,j}$ . 4. 4.

The superclustering $\mathcal{SC}_{i-1,j}$ consists of all sampled superclusters in $\mathcal{SC}^{\prime}$ augmented by their nearby clusters (i.e., at center-distance at most $\delta_{i-1,j}$ ). The center of each sampled supercluster in $\mathcal{SC}^{\prime}$ maintains its role in the augmented supercluster. The radius of this supercluster will be shown to be bounded by $r_{i-1,j}=r_{i-1,j-1}+2r_{i-1,0}+2\alpha_{i-1,j}$ . 5. 5.

Each cluster $C\in SC_{\ell}$ at center-distance larger than $\delta_{i-1,j}$ from $\mathcal{SC}^{\prime}$ , adds to $H_{i,j}$ the shortest path from its center to the center of any supercluster in $\mathcal{SC}_{i-1,j-1}$ at center-distance at most $\delta_{i-1,j}$ .

A vertex $u$ is said to be $(i-1,j)$ -unclustered if $u$ belongs to the superclusters of $\mathcal{SC}_{i-1,j-1}$ but does not belong to the superclusters of $\mathcal{SC}_{i-1,j}$ .

The parameter $\alpha_{i-1,j}$ is set in a way that guarantees that for every $(i-1,j)$ -unclustered vertex $v$ and every $u\in\mathsf{\textbf{B}}_{G}(v,\alpha_{i-1,j})$ , the subgraph $H_{i,j}$ contains a $u$ - $v$ path of length at most $\lceil k^{\epsilon}\rceil\cdot\alpha_{i-1,j}$ (hence providing a stretch of $\lceil k^{\epsilon}\rceil$ for every $v\in\partial\mathsf{\textbf{B}}_{G}(u,\alpha_{i-1,j})$ ).

Let $\mathcal{SC}_{i-1,t}$ be the output superclustering after the last $t$ step in the $i^{th}$ phase. Then the output clustering of the phase is given by $\mathcal{C}_{i}=\{\{V(SC_{j})\}~{}\mid~{}SC_{j}\in\mathcal{SC}_{i-1,t}\}$ . That is, all the clusters in a given supercluster in $\mathcal{SC}_{i-1,t}$ form a single merged cluster in $\mathcal{C}_{i}$ . Finally, let $H_{i}=\bigcup_{j}H_{i,j}$ . This completes the description of the $i^{th}$ phase. After $T$ phases, the output clustering $\mathcal{C}_{T}$ is shown to contain at most $n^{1-1/k^{\epsilon}}$ clusters in expectation. Let $H=\bigcup_{i=0}^{T}H_{i}$ be the current spanner.

Finalizing Stage: Spanner on the Cluster Graph.

Given the collection of $O(n^{1-1/k^{\epsilon}})$ clusters in $\mathcal{C}_{T}$ , the algorithm first adds a shortest path from each unclustered vertex $v\notin V(\mathcal{C}_{T})$ to its closest cluster in $\mathcal{C}_{T}$ up to center-distance $r_{T}+d$ , if such exists. Next, a cluster graph $\widehat{G}=(\mathcal{C}_{T},\mathcal{E})$ is defined by connecting two clusters $C,C^{\prime}\in\mathcal{C}_{T}$ if their center-distance in $G$ is at most $2r_{T}+2d$ . That is, $\mathcal{E}=\{(C,C^{\prime})~{}\mid~{}C,C^{\prime}\in\mathcal{C}_{T}\mbox{~{}and~{}}\mbox{\rm c-dist}_{G}(C,C^{\prime})\leq 2r_{T}+2d\}$ . Let $\widehat{H}$ be a $k^{\prime}$ -spanner of $\widehat{G}$ for $k^{\prime}=2\cdot\lceil k^{\epsilon}\rceil-3$ . For edge $(C,C^{\prime})\in\widehat{H}$ , the shortest path between the centers of $C$ and $C^{\prime}$ is added to the spanner $H$ . This completes the description of the algorithm.

Stretch Analysis.

For the rest of the analysis, let $r_{0}:=rad(\mathcal{C}_{0})$ thus $r_{0}=\lceil k^{\epsilon}\rceil$ , $t=\lceil k^{\epsilon}\rceil/4$ , $T=\log_{\lceil k^{\epsilon}\rceil/4}(k^{1-2\epsilon})$ , for $1\leq i\leq T$ , $r_{i,0}=rad(\mathcal{C}_{i})$ and let $r_{T}:=rad(\mathcal{C}_{T})$ be the radius of the clusters in the last clustering $\mathcal{C}_{T}$ at the end of the middle stage. For the sake of the stretch and size analysis, we will need the following two claims, bounding $\alpha_{i,j}$ and $r_{i,0}$ , respectively, both of these claims follow by simple inductive arguments.

Claim 2.

For every $i\in\{0,\ldots,T-1\},j\in\left[\lceil k^{\epsilon}\rceil/4\right]$ , $\alpha_{i,j}\leq(r_{i,0}+\frac{\lceil k^{\epsilon}\rceil-3}{4})\cdot\left((1+\frac{4}{\lceil k^{\epsilon}\rceil-3})^{j}-1\right)$ .

Proof.

For the definition of $\alpha_{i,j}$ see Eq. (8). We first show by induction on $j$ that

[TABLE]

The base case, $j=1$ is trivial since $\alpha_{i,1}=\left\lceil\frac{4\cdot r_{i,0}}{\lceil k^{\epsilon}\rceil-3}\right\rceil$ . Assuming that the claim holds for $j-1$ , letting $\chi=\sum_{p=0}^{j-2}\left(1+\frac{4}{\lceil k^{\epsilon}\rceil-3}\right)^{p}$ , we have:

[TABLE]

Therefore, we have:

[TABLE]

∎

We next turn to bound the radii of the superclusters in each phase $i$ of the algorithm. Note that in the $(i+1,j)$ step, since we add to each sampled supercluster, all clusters at center-distance at most $r_{i,0}+r_{i,j-1}+2\alpha_{i,j}$ , the radius $r_{i,j}$ of the new supercluster is increased by an additive term of at most $2r_{i,0}+2\alpha_{i,j}$ .

Claim 3.

If $k\geq 16^{1/\epsilon}$ then for each $0\leq i\leq T$ it holds that $r_{i,0}\leq(2\cdot\lceil k^{\epsilon}\rceil)^{i}\cdot r_{0}$ . In particular, for the final radius of the clustering $\mathcal{C}_{T}$ it holds that $r_{T}\leq 1/30\cdot 64^{\frac{1-\epsilon}{\epsilon}}\cdot k^{1-\epsilon}$ .

Proof.

We prove the claim by induction on $i$ . The base case, $i=0$ , is trivial. Assuming the claim is true for $i$ we show the correctness for $i+1$ . Phase $i+1$ begins with clusters in $\mathcal{C}_{i}$ with radii $r_{i,0}$ , and finishes after $t=\lceil k^{\epsilon}\rceil/4$ steps of Procedure $\mathsf{SuperClusterAugment}$ with clusters of radius at most $r_{i+1,0}$ . We therefore bound $r_{i+1,0}$ . At step $j$ of $\mathsf{SuperClusterAugment}$ we start with the superclustering $\mathcal{SC}_{i,j-1}$ of radius $r_{i,j-1}$ , and we add to the superclusters, clusters of radius $r_{i,0}$ at center-distance at most $r_{i,0}+r_{i,j-1}+2\cdot\alpha_{i,j}$ . Thus at the $j$ th step we increase the radii of the superclusters by an additive factor of at most $2\cdot r_{i,0}+2\cdot\alpha_{i,j}$ . Combining this with the bound on $\alpha_{i,j}$ of Claim 2:

[TABLE]

Thus by plugging $t=\lceil k^{\epsilon}\rceil/4$ , we get:

[TABLE]

The third inequality holds when $k\geq 11^{1/\epsilon}$ , the fourth inequality follows as $r_{i,0}\geq\lceil k^{\epsilon}\rceil$ , and the fifth inequality follows as $k\geq 16^{1/\epsilon}$ . Finally, by plugging $i=T$ we get that:

[TABLE]

where the fourth inequality holds for $k\geq 16^{1/\epsilon}$ . ∎

Definition 1 (Clustered and Unlcustered Vertices).

A vertex $v\in V$ is [math]-unclustered if $v\notin V(\mathcal{C}_{0})$ . A vertex $v$ is called $(i,j)$ -unclustered if $v\in V(\mathcal{SC}_{i,j-1})\setminus{V(\mathcal{SC}_{i,j})}$ . Finally, a vertex $v$ is called clustered if $v\in V(\mathcal{C}_{T})$ , i.e., it belongs to the last level of clustering.

In the following claims we show that that the spanner provides a low-stretch path from $u$ to any vertex $v$ at some fixed distance from $u$ in $G$ . The case of [math]-unclustered vertices follows by Claim 1, so it remains to consider $(i,j)$ -unclustered vertices for $i\geq 1$ .

Claim 4.

For $0\leq i\leq T-1,1\leq j\leq t$ , for each $(i,j)$ -unclustered vertex $u\in V$ , $\mbox{\rm dist}_{H}(u,v)\leq\lceil k^{\epsilon}\rceil\cdot\alpha_{i,j}$ for any $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i,j})$ .

Proof.

Let $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i,j})$ . In the beginning of the $(i+1,j)$ step we add to $H$ the shortest path from any vertex at center-distance at most $r_{i,j-1}+\alpha_{i,j}$ from $\mathcal{SC}_{i,j-1}$ to its closest center in the superclustering. In particular, since $u\in V(\mathcal{SC}_{i,j-1})$ and $\mbox{\rm c-dist}_{G}(v,\mathcal{SC}_{i,j-1})\leq\mbox{\rm dist}_{G}(v,u)+r_{i,j-1}\leq\alpha_{i,j}+r_{i,j-1}$ , we add the shortest path from $v$ to the center of some supercluster $SC_{v}\in\mathcal{SC}_{i,j-1}$ . Since $u$ gets unclustered in this step, we add to $H$ the shortest path from its cluster, $C_{u}$ , to any supercluster which is in $\mathcal{SC}_{i,j-1}$ and is at center-distance at most $r_{i,0}+r_{i,j-1}+2\alpha_{i,j}$ . Since $\mbox{\rm c-dist}_{G}(C_{u},SC_{v})\leq r_{i,0}+\mbox{\rm dist}_{G}(u,v)+\mbox{\rm c-dist}_{G}(v,SC_{v})\leq r_{i,0}+2\alpha_{i,j}+r_{i,j-1}$ , we add the shortest path from the center of $C_{u}$ to the center of $SC_{v}$ . Consequently, there is a path from $u$ to $v$ that goes through the centers of $C_{u},SC_{v}$ of length at most:

[TABLE]

where in the second inequality, the bound on $r_{i,j-1}$ follows by Eq. (3). We next show by induction on $j$ that

[TABLE]

The base case of $j=1$ holds vacuously. Assuming the correctness up to $j-1$ , by Eq. (8):

[TABLE]

Plugging Eq. (5) in Eq. (4) we get that: $\mbox{\rm dist}_{H}(u,v)\leq 3\alpha_{i,j}+(\lceil k^{\epsilon}\rceil-3)\cdot\alpha_{i,j}=\lceil k^{\epsilon}\rceil\cdot\alpha_{i,j}$ . ∎

Claim 5.

Let $u,v\in V$ be vertices at distance $d$ in $G$ , and let $P$ be a shortest path between them in $G$ . If there is some clustered vertex $w\in P$ , then $\mbox{\rm dist}_{H}(u,v)\leq 4k^{\epsilon}\cdot d+1/6\cdot 64^{\frac{1-\epsilon}{\epsilon}}\cdot k$ .

Proof.

Let $C_{w}$ be the cluster to which $w$ belongs. In the beginning of the third stage of the algorithm we add shortest paths from unclustered vertices at center-distance at most $r_{T}+d$ to their closest cluster center in $\mathcal{C}_{T}$ . Thus since $\mbox{\rm c-dist}_{G}(u,\mathcal{C}_{T})\leq r_{T}+\mbox{\rm dist}_{G}(u,w)\leq r_{T}+d$ and $\mbox{\rm c-dist}_{G}(v,\mathcal{C}_{T})\leq r_{T}+\mbox{\rm dist}_{G}(v,w)\leq r_{T}+d$ , it holds that both $u,v$ add their shortest paths to the centers of some clusters $C_{u},C_{v}\in\mathcal{C}_{T}$ to the spanner. Observe that,

[TABLE]

Therefore, it holds that $\mbox{\rm dist}_{\widehat{G}}(C_{u},C_{v})\leq 1$ , thus $\mbox{\rm dist}_{\widehat{H}}(C_{u},C_{v})\leq 2\cdot\lceil k^{\epsilon}\rceil-3$ . Since each edge $(C,C^{\prime})\in E(\widehat{H})$ translates into a path in $H$ of length at most $2r_{T}+2d$ , we have that:

[TABLE]

∎

We next complete the proof of Lemma 4.

Proof of Lemma 4.

Stretch. Let $u,v\in V$ be vertices at distance $d$ in $G$ , and let $P$ be some shortest path between them in $G$ . First observe that if there is some clustered vertex $w\in P$ the claim follows from Claim 5, so we assume there is no such vertex. Partition the path $P$ into $\ell^{\prime}\leq d$ consecutive segments the following way: denote $v_{0}:=u$ and inductively define $v_{l+1}$ to be the vertex on $P$ at distance $\Delta_{l}$ on the segment $[v_{l},v]$ , where:

[TABLE]

Let $\ell^{\prime}$ be the index of the last segment, thus $v_{\ell^{\prime}}=v$ . For any $l\in\{0,\ldots,\ell^{\prime}-1\}$ , if $v_{l}$ is [math]-unclustered then by Claim 1 it holds that $\mbox{\rm dist}_{H}(v_{l},v_{l+1})\leq 2\cdot\lceil k^{\epsilon}\rceil-1$ . If $v_{l}$ is $(i,j)$ -unclustered then since $\mbox{\rm dist}_{G}(v_{l},v_{l+1})\leq\alpha_{i,j}$ , by Claim 4 it holds that $\mbox{\rm dist}_{H}(v_{l},v_{l+1})\leq\lceil k^{\epsilon}\rceil\cdot\alpha_{i,j}$ . Thus except for at most the last segment $P[v_{\ell^{\prime}-1},v]$ , the spanner provides a multiplicative stretch of $\lceil k^{\epsilon}\rceil$ to each of the other segments. For the last segment $P[v_{\ell^{\prime}-1},v]$ , if $v_{\ell^{\prime}-1}$ is [math]-unclustered then we are done by Claim 1. Otherwise, there exist $i,j\geq 1$ such that $v_{\ell^{\prime}-1}$ is $(i,j)$ -unclustered, and by Eq. (3) and Eq. (3):

[TABLE]

Therefore by summing over all these at most $\ell^{\prime}$ segments, and plugging the bound on $r_{T}$ from Claim 3 we get that:

[TABLE]

Size Analysis.

By Fact 1(3), $|E(H_{0})|=O(k^{\epsilon}\cdot n^{1+1/k})$ in expectation. Consider now the second stage. For any $1\leq i\leq T,1\leq j\leq t$ , step $(i,j)$ starts by adding shortest paths from unclustered vertices to their closest centers in the superclustering. Since each vertex adds its shortest path to its closest center, and since we break ties in a consistent manner, this step adds at most $O(n)$ edges. The number of shortest paths added between unclustered clusters in each step can be bounded as follows.

Claim 6.

Fix a phase $i$ . For any $1\leq j\leq t$ , the algorithm adds in step $(i,j)$ a collection of $O(n)$ shortest paths in expectation.

Each shortest path is of length at most $r_{i-1,j}=r_{i-1,0}+r_{i-1,j-1}+2\alpha_{i-1,j}$ . Therefore, by combining with Claim 6, $|E(H_{i})|=O(k^{\epsilon}\cdot r_{i,0}\cdot n)$ in expectation. By summing over all $T$ phase, we get a total of $O(\sum_{i=1}^{T}k^{\epsilon}\cdot r_{i,0}\cdot n)=O(64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ edges.

For the size analysis of last stage we will need the following claim, which follows by a simple induction.

Claim 7.

For any $0\leq i\leq T,0\leq j\leq t$ , the expected number of superclusters in $\mathcal{SC}_{i,j}$ is $n^{1-\frac{\left\lceil k^{\epsilon}\right\rceil\cdot(t+1)^{i}\cdot(j+1)}{k}}$ , thus $|\mathcal{C}_{i}|=|\mathcal{SC}_{i,0}|=n^{1-\frac{\lceil k^{\epsilon}\rceil\cdot(t+1)^{i}}{k}}$ , in expectation.

From the claims above, we get that $|\mathcal{C}_{T}|=O(n^{1-(\left\lceil k^{\epsilon}\right\rceil/4)^{T}\cdot\frac{\left\lceil k^{\epsilon}\right\rceil}{k}})=O(n^{1-\frac{1}{k^{\epsilon}}})$ , in expectation. Consequently, $|E(\widehat{H})|=O(|\mathcal{C}_{T}|^{1+\frac{1}{\lceil k^{\epsilon}\rceil-1}})=O(n)$ (in expectation ). Since each edge in $\widehat{H}$ translates into a path of length at most $2r_{T}+2d$ , this step contributes $O(r_{T}\cdot n)$ edges to the spanner. Overall, $|E(H)|=O(k^{\epsilon}\cdot n^{1+1/k}+(64^{1/\epsilon}k^{1+\epsilon}+d)\cdot n)$ , in expectation. ∎ Theorem 3 now follows by noting that:

Observation 1.

Let $H\subseteq G$ be a subgraph satisfying that $\mbox{\rm dist}_{H}(u,v)\leq\alpha\cdot d+\beta$ for every $u,v\in V(G)$ at distance $d\leq\left\lceil\frac{\beta}{\alpha}\right\rceil$ in $G$ , then $H$ is a $(2\cdot\alpha,3\cdot\beta)$ -spanner of $G$ .

4 New $(3+\epsilon,\beta)$ Spanner

In this section, we show an optimized variant of our $(\alpha,\beta)$ spanner for the case where $\alpha=3+\epsilon$ . For the purpose of efficient implementation, we settle for a slightly worse value of $\beta$ . In Sec. B.1, we show an improved construction that achieves the bounds of Lemma 1. Our main result is:

Lemma 5.

For any $n$ -vertex unweighted graph $G=(V,E)$ , integer $k$ and $\epsilon>0$ , one can compute a $(3+\epsilon,\beta)$ spanner $H\subseteq G$ with $\beta=O((5+16/\epsilon)\cdot k^{\log(5+16/\epsilon)})$ , and expected size $|E(H)|=O(n^{1+1/k}+k^{\log(5+16/\epsilon)}\cdot n/\epsilon)$ .

We note that unlike the constructions in earlier sections, this construction works for all distances $d$ , and there is no need to consider each distance class separately.

Algorithm Description.

For simplicity, we assume throughout that $4/\epsilon$ is an integer. The algorithm contains $T=\lceil\log k+1\rceil$ clustering phases. Starting with the trivial clustering $\mathcal{C}_{0}=\{\{v\}~{}\mid~{}v\in V\}$ of radius [math], in each phase $i\geq 1$ , given is a clustering $\mathcal{C}_{i-1}$ of expected size $n_{i-1}=n^{1-\frac{2^{i-2}}{k}}$ (except for $i=1$ where $n_{0}=n$ ) with radius at most $r_{i-1}=2\cdot(5+16/\epsilon)^{i-2}$ . The output of the $i^{th}$ phase is a clustering $\mathcal{C}_{i}$ of expected size $n_{i}=n^{1-\frac{2^{i-1}}{k}}$ and radius $r_{i}=2\cdot(5+16/\epsilon)^{i-1}$ , and a subgraphs $H_{i}$ that takes care of the unclustered vertices in $V(\mathcal{C}_{i})\setminus V(\mathcal{C}_{i-1})$ .

We now zoom into the $i^{th}$ phase and explain the construction of the clustering $\mathcal{C}_{i}$ and the subgraph $H_{i}$ . The phase is governed by two key parameters: the sampling probability $p_{i}$ of each cluster $C\in\mathcal{C}_{i-1}$ to join the clustering $\mathcal{C}_{i}$ , and an augmentation radius $\alpha_{i}$ . Let $\alpha_{1}=1/2,p_{1}=n^{-1/k}$ and for every $i\geq 2$ , define

[TABLE]

The description of the $i^{th}$ phase for $i\in\{1,\ldots,T\}$ is as follows:

Each unclustered vertex $v\in V\setminus{V(\mathcal{C}_{i-1})}$ with $\mbox{\rm c-dist}_{G}(v,\mathcal{C}_{i-1})\leq r_{i-1}+\alpha_{i}$ , adds to $H_{i}$ the shortest path to its closest center in $\mathcal{C}_{i-1}$ . 2. 2.

Let $\mathcal{C}^{\prime}\subseteq\mathcal{C}_{i-1}$ be the collection of clusters obtained by sampling each cluster $C_{\ell}\in\mathcal{C}_{i-1}$ independently with probability of $p_{i}$ . 3. 3.

Each cluster $C\in\mathcal{C}_{i-1}$ such that $\mbox{\rm c-dist}_{G}(C,\mathcal{C}^{\prime})\leq 4\cdot r_{i-1}+4\cdot\alpha_{i}$ , joins the sampled cluster in $\mathcal{C}^{\prime}$ with minimal distance between their centers, and adds the shortest path between their centers into $H_{i}$ . 4. 4.

The clustering $\mathcal{C}_{i}$ consists of all the sampled clusters in $\mathcal{C}^{\prime}$ augmented by their nearby clusters in $\mathcal{C}_{i-1}$ (i.e., at distance $4\cdot r_{i-1}+4\cdot\alpha_{i}$ between their centers). That is, each cluster in $\mathcal{C}_{i}$ is made of a star of clusters, with the head of the star is a sampled cluster $C\in\mathcal{C}^{\prime}$ whose center $r(C)$ is connected to the centers of a subset of clusters in $\mathcal{C}_{i-1}$ . 5. 5.

Each cluster $C\in\mathcal{C}_{i-1}$ such that $\mbox{\rm c-dist}_{G}(C,\mathcal{C}^{\prime})>4\cdot r_{i-1}+4\cdot\alpha_{i}$ , adds to $H_{i}$ the shortest path between its center to any center of $C^{\prime}\in\mathcal{C}_{i-1}$ with $\mbox{\rm c-dist}_{G}(C,C^{\prime})\leq 2\cdot r_{i-1}+2\cdot\alpha_{i}$ .

This completes the description of the $i^{th}$ phase. Let $H=\bigcup_{i=1}^{T}H_{i}$ and let $\mathcal{C}_{T}$ be the output clustering of the last phase $T$ . In the analysis section we show that in expectation $\mathcal{C}_{T}$ consists of at most a single cluster $C$ . In the latter case, the algorithm adds to the output spanner $H$ , a BFS tree rooted at the centers of the clusters of $\mathcal{C}_{T}$ .

Stretch Analysis.

We start by bounding the radius $r_{i}$ of the clustering $\mathcal{C}_{i}$ for every $i\in\{1,\ldots,T\}$ . We first make a simple observation:

Observation 2.

For every $i\in\{1,\ldots,T\}$ , $r_{i}\leq 2\cdot(5+16/\epsilon)^{i-1}$ . In particular, the radius of cluster in the final clustering $\mathcal{C}_{T}$ is $r_{T}\leq(10+32/\epsilon)\cdot k^{\log(5+16/\epsilon)}$ .

Proof.

The claim is shown by induction on $i$ . For the base case, where $i=1$ , the [math]-level clusters are simply singletons, and each node joins its closest sampled vertex at distance at most $4\cdot r_{0}+4\cdot\alpha_{1}=2$ . Therefore the clusters in $\mathcal{C}_{1}$ have radius of $2$ . Assume that the claim holds up to $i-1\geq 0$ , and consider the $i^{th}$ clustering $\mathcal{C}_{i}$ which is defined in the $i^{th}$ phase based on the clustering $\mathcal{C}_{i-1}$ . Each cluster in $\mathcal{C}_{i}$ is formed by a star: the head of the star is the sampled cluster $C$ that is connected to all clusters $C^{\prime}\in\mathcal{C}_{i-1}$ with center-distance at most $4\cdot r_{i-1}+4\alpha_{i}$ from $C$ . The radius of this cluster in $\mathcal{C}_{i}$ is bounded by

[TABLE]

where the second inequality follows by plugging the bound on $r_{i-1}$ obtained from the induction assumption and the bound on $\alpha_{i}$ from Eq. (11). ∎

Observation 3.

For $1\leq i\leq T$ , $|\mathcal{C}_{i}|=n^{1-2^{i-1}/k}$ therefore after $T$ phases, there is one cluster in $\mathcal{C}_{T}$ in expectation.

Proof.

By induction on $i$ . For the base case $i=1$ , since $p_{1}=n^{-1/k}$ , the number of sampled clusters is $n^{1-1/k}$ in expectation. Assume that the claim holds up to $i-1\geq 1$ , and consider the $i^{th}$ clustering. In the $i$ th phase, each cluster in $\mathcal{C}_{i-1}$ is sampled with probability of $p_{i}=|\mathcal{C}_{i-1}|/n$ , therefore, the number of sampled clusters in expectation is $|\mathcal{C}_{i-1}|^{2}/n$ . By induction assumption, $|\mathcal{C}_{i-1}|=n^{1-2^{i-2}/k}$ , and thus $|\mathcal{C}_{i}|=n^{1-2^{i-1}/k}$ . By plugging $T=\lceil\log k+1\rceil$ , we get that in expectation $|\mathcal{C}_{T}|\leq 1$ . ∎

Our stretch argument is based on the following definition of clustered vertices.

Definition 2.

For every $1\leq i\leq T$ , a vertex $v\in V(\mathcal{C}_{i-1})\setminus{V(\mathcal{C}_{i})}$ is called $i$ -unclustered. In addition, every $v\in V(\mathcal{C}_{T})$ is called clustered (i.e., $v$ belongs to the final clustering).

For every $i$ -unclustered vertex we provide a stretch guarantee as a function of $i$ . Specifically, the earlier that the vertex $u$ stops being clustered, the smaller is the ball around $u$ for which the stretch guarantee is provided.

Claim 8 ( $1$ -unclustered).

For any $1$ -unclustered vertex $u\in V$ , it holds that $\mbox{\rm dist}_{H}(u,v)=1$ for all $v\in N_{G}(u)$ .

Proof.

For the first phase in step $(5)$ , every $1$ -unclustered vertex $u$ adds to the spanner its edges to any vertex at distance $2\alpha_{1}=1$ , the claim follows. ∎

Claim 9 ( $i$ -unclustered).

For any $i$ -unclustered vertex $u$ , for every $i\in\{2,\ldots,T\}$ , and every $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i})$ , it holds that $\mbox{\rm dist}_{H}(u,v)\leq 4\cdot r_{i-1}+3\cdot\mbox{\rm dist}_{G}(u,v)$ .

Proof.

Consider an $i$ -unclustered vertex $u$ and $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i})$ . By definition, $u\in V(\mathcal{C}_{i-1})$ , thus $\mbox{\rm c-dist}_{G}(v,\mathcal{C}_{i-1})\leq r_{i-1}+\mbox{\rm dist}_{G}(u,v)$ . Let $C_{v}$ be closest cluster to $v$ with respect to the $\mbox{\rm c-dist}()$ measure, and let $C_{u}$ be the cluster of $u$ in $\mathcal{C}_{i-1}$ . Then, the algorithm adds to $H_{i}$ a shortest path from $v$ to the center of $C_{v}$ in step (1). We have:

[TABLE]

Since $u$ becomes unclustered phase $i$ , in step $(5)$ the algorithm adds to $H_{i}$ the shortest path from the center of $C_{u}$ to the center of $C_{v}$ . We therefore have:

[TABLE]

∎

In particular, for $v\in\partial\mathsf{\textbf{B}}_{G}(u,\alpha_{i})$ , by Eq. (11) it holds that $\mbox{\rm dist}_{H}(u,v)\leq(3+\epsilon)\cdot\mbox{\rm dist}_{G}(u,v)$ . Since the algorithm adds to the spanner, a BFS tree w.r.t the centers of the clusters in $\mathcal{C}_{T}$ , we have that:

Claim 10.

Let $u,v\in V$ and let $P_{uv}$ be the shortest path between them in $G$ . If there is a clustered vertex $w\in P_{uv}$ then

[TABLE]

Proof.

Since $w$ is clustered, its distance from the cluster center $z$ is at most $r_{T}$ . As the algorithm adds the BFS tree of $z$ to the spanner, we have:

[TABLE]

the claim follows by plugging the bound on $r_{T}$ from Obs. 2. ∎

We are now ready to prove Lemma 5.

Proof of Lemma 5.

Let $u,v\in V$ be vertices at some distance $d$ in $G$ , and let $P:=\{u:=v_{0},\cdots,v_{d}:=v\}$ be the shortest path between them in $G$ . First observe that by Claim 10, if there is some clustered vertex in $P$ then we are done. Assume from now on that no vertex on the path is clustered. We define a sequence of vertices between $u$ and $v$ in an iterative manner: let $u_{0}:=u$ , and for every $j\in\{1,\ldots,d-1\}$ given that $u_{j}=v_{i}$ , define $u_{j+1}=v_{i+\Delta_{j}}$ where:

[TABLE]

Let $\ell\leq d$ be the minimal index such that $u_{\ell}=v$ , thus we have defined $\ell$ segments $P_{j}=[u_{j-1},u_{j}]$ for $j\in\{1,\ldots,\ell\}$ . By Claim 8, the first case in the definition of $\Delta_{j}$ causes no stretch. By Claim 9 the second cases causes a multiplicative stretch of $3+\epsilon$ , unless it ends in $v$ . The latter case can happen only for the last segment, and in this case by Claim 9 we will get an extra additive stretch of $4\cdot r_{T-1}$ , which by Observation 2 is at most $8\cdot k^{\log(5+16/\epsilon)}$ . In other words, for every segment $P[u_{j-1},u_{j}]$ such that $u_{j}\neq v$ , we have that $\mbox{\rm dist}_{H}(u_{j-1},u_{j})\leq(3+\epsilon)\cdot|P[u_{j-1},u_{j}]|$ . For the last segment $P[u_{\ell-1},v]$ we have that $\mbox{\rm dist}_{H}(u_{\ell-1},v)\leq 3\cdot|P[u_{\ell-1},v]|+4\cdot r_{T-1}$ . Therefore,

[TABLE]

Size Analysis. Fix a phase $i$ . In step $(1)$ each vertex adds to $H_{i}$ a path of length at most $r_{i-1}+\alpha_{i}$ to its closest cluster center in $\mathcal{C}_{i-1}$ . This adds $O(n)$ edges in total by breaking shortest path ties in a consistent manner. In step $(3)$ we add to $H_{i}$ the paths that connect clusters in $\mathcal{C}_{i-1}$ to the sampled clusters at distance at most $4\cdot r_{i-1}+4\cdot\alpha_{i}$ . This also consume at most $O(n)$ edges by breaking shortest path ties in a consistent manner. In step $(5)$ we add to $H_{i}$ paths from clusters that were not sampled and did not join other clusters, to nearby clusters at distance at most $2\cdot r_{i-1}+2\cdot\alpha_{i}$ . To bound the number of edges added to the spanner in this step, we will need the following:

Lemma 6.

The number of shortest paths added to $H_{i}$ in step $(5)$ is at most $1/p_{i}$ in expectation.

Proof.

Let $C\in\mathcal{C}_{i-1}$ and let $C_{1}\cdots C_{\ell}$ be the clusters with center-distance at most $2\cdot r_{i-1}+2\cdot\alpha_{i-1}$ from $C$ . Note that the centers of each pair of these clusters is at distance at most $4\cdot r_{i-1}+4\cdot\alpha_{i-1}$ , thus if one of these clusters is sampled, then all the others will join some sampled cluster. That is, none will take part in the fifth step of the algorithm. Thus the number of shortest paths from a given cluster $C$ added in this step is $\ell\cdot(1-p_{i})^{\ell}\leq 1/p_{i}$ in expectation. ∎

We now bound the total number of edges added in step $(5)$ of phase $i$ . In phase $i$ there are $|\mathcal{C}_{i-1}|$ clusters and each adds to the spanner a shortest path of length at most $2\cdot r_{i-1}+2\cdot\alpha_{i}$ . Thus by Lemma 6 this adds at most $(2\cdot r_{i-1}+2\cdot\alpha_{i})\cdot|\mathcal{C}_{i-1}|/p_{i}$ edges in expectation. By plugging the values of $p_{i}$ as defined in Eq. (11), it follows that for $i=1$ , we get a total of $n^{1+1/k}$ edges. For $i\geq 2$ , the total number of edges is bounded by $r_{i}\cdot n$ . By plugging the values of $r_{i-1}$ of Observation 2 we get that the expected number of edges in all the $H_{i}$ subgraphs is bounded by

[TABLE]

The last step adds a constant number of BFS trees, thus contributing $O(n)$ edges. Lemma 5 follows. ∎

5 A New Family of $(k^{\epsilon},k^{1-\epsilon})$ Hopsets

In this section we present new construction of $(\alpha,\beta)$ hopsets with $O_{k}(n^{1+1/k})$ edges and $\alpha\cdot\beta=O(k)$ . The structure of the section is as follows. First, in Subsec. 5.1 we show the construction of $O(k^{\epsilon},k^{1-\epsilon})$ hopsets in the simpler regime of $\epsilon\in[1/2,1)$ . Then, in Subsec. 5.2 we show the high level construction of $(O(k^{\epsilon}),O_{\epsilon}(k^{1-\epsilon}))$ hopsets for the complementary regime of $\epsilon\in(0,1/2)$ . Finally, in Sec. 5.3 we also show the construction of $(3+\epsilon,\beta)$ hopsets.

5.1 $(k^{\epsilon},k^{1-\epsilon})$ Hopsets for $\epsilon\in[1/2,1)$ .

This subsection is devoted for proving Theorem 4 of $\epsilon\in[1/2,1)$ . We show the following:

Theorem 5.

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integer $k\geq 1$ and $\epsilon\in[1/2,1)$ such that $k\geq 10^{1/\epsilon}$ , one can compute an $(\alpha,\beta)$ hopset $H$ for $\alpha=18\cdot k^{\epsilon}$ and $\beta=8\cdot k^{1-\epsilon}+1$ where $|E(H)|=O(k^{\epsilon}\cdot n^{1+1/k}\cdot\log\Lambda)$ edges in expectation.

For simplicity we focus on a fixed distance class by considering all vertex pairs at distance $[d/2,d]$ . The algorithm is then applied for each of the $\log\Lambda$ distance classes. Throughout, when adding edges $(u,v)$ to the hopset, we set the weight of these edges to $\mbox{\rm dist}_{G}(u,v)$ . In addition, to distinguish between real $G$ -edges and $H$ -edges, we refer the latter by hops. The algorithm has a similar structure to Alg. $\mathsf{SpannerShortDist}$ , and the key difference is that we use here the Thorup-Zwick hopsets to handle the sparse case, rather then applying the truncated Baswana-Sen algorithm. Specifically, the first step of the algorithm computes a $(2k-1,2)$ hopset $H_{TZ}$ be applying the algorithm of Thorup and Zwick. Let $A_{\left\lceil k^{\epsilon}\right\rceil}$ be the $\lceil k^{\epsilon}\rceil$ th level of centers, and define $\mathcal{C}$ as the clusters of weighted radius

[TABLE]

centered at the vertices of $A_{\left\lceil k^{\epsilon}\right\rceil}$ . Specifically, every vertex $u$ of distance at most $r_{0}$ from $A_{\left\lceil k^{\epsilon}\right\rceil}$ joins the cluster of its closest center in $A_{\left\lceil k^{\epsilon}\right\rceil}$ , this vertex is now clustered. In the hopset, we connect each clustered vertex $u$ to the center of its cluster.

In the second step, a cluster graph $\widehat{G}$ is defined on the clusters of $\mathcal{C}$ in the exact same manner as in Alg. $\mathsf{SpannerShortDist}$ . That is, any two clusters $C,C^{\prime}$ in $\mathcal{C}$ are neighbors in $\widehat{G}$ if their $G$ -center-distance is at most $2\cdot r_{0}+d$ . Note that as before, the cluster graph $\widehat{G}$ is unweighted. Letting $\widehat{H}$ be the $(2k^{1-\epsilon}-1)$ -spanner on $\widehat{G}$ , then for every edge $(C,C^{\prime})\in\widehat{H}$ , the algorithm adds to the hopset, an hop $(r(C),r(C^{\prime}))$ between the centers $r(C),r(C^{\prime})$ of the clusters $C,C^{\prime}$ . This completes the description of the algorithm.

We are now ready to complete the proof of Thm. 5. Throughout, a vertex $v$ is clustered if $v\in V(\mathcal{C})$ , and unclustered otherwise.

Claim 11.

For every unclustered vertex $x$ and every vertex $y\in\mathsf{\textbf{B}}_{G}(x,r_{0}/(\lceil k^{\epsilon}\rceil+1))$ , it holds that:

[TABLE]

Proof.

Since $x$ is unclustered, we have that $\mathsf{\textbf{B}}_{G}(x,r_{0})\cap A_{\left\lceil k^{\epsilon}\right\rceil}=\emptyset$ . That is, $\mbox{\rm dist}_{G}(x,A_{\left\lceil k^{\epsilon}\right\rceil})>r_{0}$ . Let $i_{x}$ be the minimal index $i$ such that $p_{i}(x)\in B_{i}(y)$ , define $i_{y}$ analogously and let $i^{*}=\min\{i_{x},i_{y}\}$ . Assume towards contradiction that $i^{*}\geq\lceil k^{\epsilon}\rceil$ . First assume that $i_{x}\leq i_{y}$ , then by Fact 2 (i) we have that

[TABLE]

thus leading to a contradiction as $x$ is unclustered. Otherwise, assume that $i_{y}\leq i_{x}$ , then again by Fact 2(i) we have that

[TABLE]

contradiction again. Thus, we have that $i^{*}\leq\lceil k^{\epsilon}\rceil$ and by Fact 2 (ii), $\mbox{\rm dist}^{(2)}_{H_{TZ}\cup G}(x,y)\leq(2\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(x,y)$ as desired.∎

We next complete the proof of Theorem 5.

Proof.

Stretch and Hop-Bound. Fix a pair $u,v$ at distance $d^{\prime}\in[d/2,d]$ in $G$ , and let $P$ be their shortest-path in $G$ . First, assume that at most one vertex in $P$ is clustered. Partition the path $P$ into $\ell\leq k^{1-\epsilon}$ consecutive segments from $u$ to $v$ in the following way: denote $v_{0}:=u$ , and for every $i\geq 0$ inductively define $v^{\prime}_{i}$ to be the far-most vertex on $P[v_{i},v]$ at distance101010Shortest path distance and not hop-distance. at most $d/k^{1-\epsilon}$ from $v_{i}$ . Observe that it might be the case that $v^{\prime}_{i}$ is simply $v_{i}$ , in the case where the edge incident to $v_{i}$ on the segment $P[v_{i},v]$ segment has weight larger than $d/k^{1-\epsilon}$ . Also, let $v_{i+1}$ be the vertex incident to $v^{\prime}_{i}$ on the segment $P[v^{\prime}_{i},v]$ . In the case where $v^{\prime}_{i}=v$ , also let $v_{i+1}=v^{\prime}_{i}=v$ .

Observe that for every $i$ , if $v^{\prime}_{i}\neq v_{i+1}$ (which happens when $v^{\prime}_{i}\neq v$ ) then it must hold that $\mbox{\rm dist}_{G}(v_{i},v_{i+1})\geq d/k^{1-\epsilon}$ . Therefore the path $P$ is partitioned into at most $k^{1-\epsilon}$ segments, where the $i^{th}$ segment is $P_{i}:=[v_{i-1},v_{i}]$ and $P=P_{1}\circ\ldots\circ P_{\ell}$ , where $\ell$ is the index of the last segment that reaches $v$ .

For any $i\in\{0,\ldots,\ell-1\}$ , $\mbox{\rm dist}_{G}(v_{i},v^{\prime}_{i})\leq d/k^{1-\epsilon}$ and (except maybe for $i=\ell-1$ ) $\mbox{\rm dist}_{G}(v_{i},v_{i+1})\geq d/k^{1-\epsilon}$ . For $v_{i}\neq v^{\prime}_{i}$ , by the assumption that $P$ has at most one clustered vertex, it holds that at least one of the vertices $v_{i},v^{\prime}_{i}$ is unclustered. Thus by Claim 11, it holds that $\mbox{\rm dist}^{(2)}_{G\cup H_{TZ}}(v_{i},v^{\prime}_{i})\leq(2\cdot\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(v_{i},v^{\prime}_{i})$ , and consequently using the edge $(v^{\prime}_{i},v_{i+1})\in P$ , $\mbox{\rm dist}^{(3)}_{G\cup H_{TZ}}(v_{i},v_{i+1})\leq(2\cdot\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(v_{i},v_{i+1})$ . In the last segment we might have that $v_{\ell}=v^{\prime}_{\ell-1}=v$ , and thus since $\mbox{\rm dist}_{G}(v_{\ell-1},v_{\ell})\leq d/k^{1-\epsilon}$ , by Claim 11 again, we get that $\mbox{\rm dist}^{(2)}_{G\cup H_{TZ}}(v_{\ell-1},v)\leq(2\cdot\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(v_{\ell-1},v)$ . Finally,

[TABLE]

Next, assume that $P$ contains at least two clustered vertices. Let $u^{\prime},v^{\prime}$ be the leftmost and rightmost clustered vertices on $P$ , and let $P_{1}:=[u,u^{\prime}],P_{2}:=[u^{\prime},v^{\prime}],P_{3}:=[v^{\prime},v]$ such that $P=P_{1}\circ P_{2}\circ P_{3}$ . Let $C_{u^{\prime}},C_{v^{\prime}}\in\mathcal{C}$ be the clusters of $u^{\prime},v^{\prime}$ respectively, and let $z_{u^{\prime}}$ and $z_{v^{\prime}}$ be the centers of these clusters. Thus the hopset $H$ contains a hop from $u^{\prime}$ to $z_{u^{\prime}}$ and a hop from $v^{\prime}$ to $z_{v^{\prime}}$ . Since $\mbox{\rm c-dist}_{G}(C_{u^{\prime}},C_{v^{\prime}})\leq 2r_{0}+\mbox{\rm dist}(u^{\prime},v^{\prime})\leq 2r_{0}+d~{},$ it holds that $(C_{u^{\prime}},C_{v^{\prime}})\in\widehat{G}$ . This in turn implies that $\mbox{\rm dist}_{\widehat{H}}(C_{u^{\prime}},C_{v^{\prime}})\leq(2\cdot k^{1-\epsilon}-1)$ . Thus in the hopset $H$ we have a path of at most $2+(2\cdot k^{1-\epsilon}-1)$ hops from $u^{\prime}$ to $v^{\prime}$ : one hop from $u^{\prime}$ to $z_{u^{\prime}}$ , then $(2\cdot k^{1-\epsilon}-1)$ hops from $z_{u^{\prime}}$ to $z_{v^{\prime}}$ , and the last hop from $z_{v^{\prime}}$ to $v^{\prime}$ . Since all the clusters in $\mathcal{C}$ have radius $r_{0}=d\cdot\frac{\lceil k^{\epsilon}\rceil+1}{k^{1-\epsilon}}$ , and since we connect clusters $C,C^{\prime}$ in the cluster-graph $\widehat{G}$ if $\mbox{\rm c-dist}_{G}(C,C^{\prime})\leq 2r_{0}+d$ , we have that the distance in $G$ between the centers of adjacent clusters in $\widehat{G}$ is at most $d+2r_{0}$ . Therefore, each edge in $\widehat{G}$ has a weight at most $d+2r_{0}$ . Overall since $\epsilon\geq 1/2$ , we have:

[TABLE]

Finally, since each of the segments $P[u,u^{\prime}]$ and $P[v^{\prime},v]$ contains at most one clustered vertex, by the argument of the first case, there is a path of at most $(3\cdot k^{1-\epsilon})$ hops from $u$ to $u^{\prime}$ , and from $v^{\prime}$ to $v$ that provides a multiplicative stretch of a $(2\cdot\lceil k^{\epsilon}\rceil+1)$ for each of the segments $P[u,u^{\prime}]$ and $P[v^{\prime},v]$ . Therefore, we have that

[TABLE]

where the last inequality holds as $k^{\epsilon}\geq 10$ . Since we assume $\mbox{\rm dist}_{G}(u,v)\in[d/2,d]$ , the final stretch is bounded by $\alpha\leq 18\cdot k^{\epsilon}$ stretch. See Figure 5 for an illustration.

Size. We show that the total number of edges added to $H$ is bounded by $O(k^{\epsilon}\cdot n^{1+1/k})$ in expectation. Step (I) adds at most $O(k^{\epsilon}\cdot n^{1+1/k})$ edges by Fact 2(iii). Step (II) adds $n$ edges, between possibly each vertex to its closest cluster in $\mathcal{C}$ . Finally, in Step (III) we add $|E(\widehat{H})|$ edges to the hopset. Since $\widehat{G}$ contains $n^{1-\lceil k^{\epsilon}\rceil/k}$ clusters, the $(2k^{1-\epsilon}-1)$ spanner $\widehat{H}$ contains at most $O(n^{(1-1/k^{1-\epsilon})(1+1/k^{1-\epsilon})})=O(n)$ edges. ∎

5.2 $(O(k^{\epsilon}),O_{\epsilon}(k^{1-\epsilon}))$ Hopsets for $0<\epsilon<1/2$ .

Finally, we consider $(k^{\epsilon},k^{1-\epsilon})$ hopsets for the complementary regime of $\epsilon\in(0,1/2)$ , which will complete the proof of Theorem 4. This regime is considerably more involved and bares similarities with the spanner construction of Sec. 3. Specifically, we show:

Theorem 6.

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integer111111The statement can work for any $k$ upon suffering from larger constants. $k\geq 16^{1/\epsilon}$ , and $0<\epsilon<\frac{1}{2}$ , one can compute an $(\alpha,\beta)$ hopset $H$ for $\alpha=9\cdot k^{\epsilon}$ and $\beta=36^{1/\epsilon}\cdot k^{1-\epsilon}$ , of expected size $O((k^{\epsilon}\cdot n^{1+1/k}+\frac{k^{\epsilon}}{\epsilon}\cdot n)\log\Lambda)$ .

To prove the theorem, we will show the following key lemma which restricts attention to a fixed distance class $[d/2,d]$ .

Lemma 7.

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integers $k\geq 16^{1/\epsilon},d\geq 1$ and $0<\epsilon<\frac{1}{2}$ , there is a hopset $H$ of expected size $O(k^{\epsilon}\cdot n^{1+1/k}+\frac{k^{\epsilon}}{\epsilon}\cdot n)$ such that for every $u,v\in V$ , at distance $d^{\prime}\in[d/2,d]$ in $G$ , it holds that $\mbox{\rm dist}_{H}^{(36^{1/\epsilon}\cdot k^{1-\epsilon})}(u,v)\leq 4.5\cdot k^{\epsilon}\cdot d$ .

Algorithm $\mathsf{HopsetsSmallStretch}$ .

Fix a distance range $[d/2,d]$ . The same procedure will be repeated for every distance range. The algorithm works in three stages. In the first stage it calls Procedure $\mathsf{TruncatedTZ}$ for $\left\lceil k^{\epsilon}\right\rceil$ steps and radius parameter $r_{0}=d/R^{\prime}$ where $R^{\prime}=O_{\epsilon}(k^{1-2\cdot\epsilon})$ (see Algorithm 5). This results in a partial hopset $H_{0}$ , and a clustering $\mathcal{C}_{0}$ of $O(n^{1-\frac{1}{k^{1-\epsilon}}})$ clusters in expectation, each with radius $r_{0}=d/R^{\prime}$ . We say that a vertex is [math]-unclustered if it is not in $V(\mathcal{C}_{0})$ . By the end of this stage, we will have the guarantee that for every [math]-unclustered vertex $v$ and every vertex $u\in\mathsf{\textbf{B}}_{G}(v,O_{\epsilon}(d/k^{1-\epsilon}))$ , the current hopset contains a $2$ -hop path $P^{\prime}$ from $u$ to $v$ of length at most $O_{\epsilon}(d/k^{1-2\epsilon})$ .

The second stage applies contains $T=O(\frac{1}{\epsilon})$ phases of superclustering. In each phase $i$ , the procedure runs for $t=\lceil k^{\epsilon}\rceil/4$ steps. We refer to the $j^{th}$ step of the $i^{th}$ phase, by step $(i,j)$ . Step $(i,j)$ begins with a superclustering $\mathcal{SC}_{i-1,j-1}$ . The output of that step is a superclustering $\mathcal{SC}_{i-1,j}$ along with a collection of hop edges to be added to the hopset.

We say that a vertex is $(i,j)$ -unclustered if it is in $V(\mathcal{SC}_{i,j-1})\setminus V(\mathcal{SC}_{i,j})$ . The analysis shows that for each $(i,j)$ -unclustered vertex $v$ , the edges added to the hopset in the $(i,j)$ step provides a $3$ -hop $u$ - $v$ path $P^{\prime}$ of length $O(k^{\epsilon}\cdot\alpha_{i,j})$ to any vertex $u\in\mathsf{\textbf{B}}_{G}(v,\alpha_{i,j})$ . The parameter $\alpha_{i,j}$ grows at each step $(i,j)$ but it is bounded by $\alpha_{i,j}=O(1.5^{i}\cdot e^{4j/k^{\epsilon}}\cdot d/k^{1-(i+2)\cdot\epsilon})$ .

At the beginning of the third stage we have a clustering $\mathcal{C}_{T}$ with $O(n^{1-\frac{1}{k^{\epsilon}}})$ clusters in expectation, of (weighted) radius at most $d$ . First, the algorithm connects each vertex $v$ , satisfying that $\mbox{\rm c-dist}(v,\mathcal{C}_{T})\leq r_{T}+d$ , to its closest center. Next, a cluster graph $\widehat{G}=(\mathcal{C}_{T},\mathcal{E})$ is defined by letting $\mathcal{E}=\{(C,C^{\prime})~{}\mid~{}\mbox{\rm c-dist}(C,C^{\prime})\leq 2r_{T}+d,~{}C,C^{\prime}\in\mathcal{C}_{T}\}$ . Let $\widehat{H}$ be an $(2\cdot\lceil k^{\epsilon}\rceil-3)$ spanner of $\widehat{G}$ . For every edge $(C,C^{\prime})\in\widehat{H}$ , the algorithm adds an hop between $C$ and $C^{\prime}$ to the hopset $H$ . This completes the high level description of the algorithms.

First Stage: Initial Clustering.

The algorithm starts by applying Procedure $\mathsf{TruncatedTZ}$ for $\left\lceil k^{\epsilon}\right\rceil$ steps with radius parameter $r_{0}:=d/R^{\prime}$ where $R^{\prime}:=1/2\cdot 36^{\frac{1}{\epsilon}}\cdot k^{1-2\epsilon}$ . This results in a clustering $\mathcal{C}_{0}$ and a partial hopset $H_{0}$ . By the properties of Procedure $\mathsf{TruncatedTZ}$ and the chosen parameters, the clustering has $n^{1-\left\lceil k^{\epsilon}\right\rceil/k}$ clusters, in expectation, of radius at most $d/R^{\prime}$ . The partial hopset $H_{0}$ contains $O(k^{\epsilon}\cdot n^{1+1/k})$ edges, in expectation. By the exact same argument as in Claim 11, we have that:

Claim 12.

For every [math]-unclustered vertex $u\in V$ and every $v\in\mathsf{\textbf{B}}_{G}(u,r_{0}/(\lceil k^{\epsilon}\rceil+1))$ , it holds that: $\mbox{\rm dist}^{(2)}_{G\cup H}(u,v)\leq(2\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(u,v)~{}.$

Middle Stage: superclustering.

For clarity of presentation, throughout we assume that $\lceil k^{\epsilon}\rceil$ divides $4$ , up to factor $4$ in the final stretch, this assumption can be made without loss of generality. The middle step consists of $T:=\log_{\lceil k^{\epsilon}\rceil/4}(k^{1-2\epsilon})$ applications of Procedure $\mathsf{ClusterAndAugmentHop}$ . For clarity of presentation we assume that $T$ is an integer, and in Sec. A we describe how to remove this assumption. We refer to each application of this procedure by a phase. In each phase $i$ , the input to Procedure $\mathsf{ClusterAndAugmentHop}$ is a clustering $\mathcal{C}_{i-1}=\{C_{1},\ldots,C_{\ell}\}$ of radius $r_{i-1}=r_{i-1,0}$ where $r_{i-1}=O(k^{\epsilon}\cdot(2k^{\epsilon})^{i-1})$ . The output of the phase is a clustering $\mathcal{C}_{i}$ of radius $r_{i}$ , and a hopset $H_{i}$ that takes care of all vertices that became unclustered in that phase. This output clustering is obtained by applying $t:=\lceil k^{\epsilon}\rceil/4$ steps of supercluster growing. As we will see, the clustering procedure will be very similar to Proc. $\mathsf{SuperClusterAugment}$ from Sec. 3, only that we add hops rather than shortest paths.

Starting with the trivial superclustering $\mathcal{SC}_{i-1,0}=\{\{C_{j}\}~{}\mid~{}C_{j}\in\mathcal{C}_{i-1}\}$ of radius $r_{i-1,0}=r_{i-1}$ , in the $j^{th}$ step of phase $i$ for $j\geq 1$ , the algorithm is given a superclustering $\mathcal{SC}_{i-1,j-1}$ of radius $r_{i-1,j-1}$ . The algorithm then outputs a superclustering $\mathcal{SC}_{i-1,j}$ along with a hopset $H_{i,j}$ by taking the following steps.

Each unclustered vertex $v\in V\setminus{V(\mathcal{SC}_{i-1,j-1})}$ at center-distance at most $r_{i-1,j-1}+\alpha_{i-1,j}$ from centers of $\mathcal{SC}_{i-1,j-1}$ adds to $H_{i,j}$ a weighted hop to its closest center in $\mathcal{SC}_{i-1,j-1}$ , where $\alpha_{i-1,0}=0$ and for $j\geq 1$ ,

[TABLE] 2. 2.

Let $\mathcal{SC}^{\prime}\subseteq\mathcal{SC}_{i-1,j-1}$ be the collection of superclusters obtained by sampling each supercluster $SC_{\ell}\in\mathcal{SC}_{i-1,j-1}$ independently with probability of $\frac{n_{0}}{n}$ , where $n_{0}:=|\mathcal{C}_{i-1}|$ . 3. 3.

Each cluster $C\in SC_{\ell}$ at center-distance at most $r_{i-1,0}+r_{i-1,j-1}+2\cdot\alpha_{i-1,j}$ from $\mathcal{SC}^{\prime}$ joins the supercluster of its closest center. All the vertices in $C$ add to $H_{i,j}$ a hop to the center of their new supercluster. 4. 4.

The superclustering $\mathcal{SC}_{i-1,j}$ consists of all sampled superclusters augmented by their nearby clusters (i.e., at center-distance $r_{i-1,0}+r_{i-1,j-1}+2\cdot\alpha_{i-1,j}$ ). The center of each new supercluster is the center of the sampled supercluster. 5. 5.

Each cluster $C\in SC_{\ell}$ at center-distance larger than $r_{i-1,0}+r_{i-1,j-1}+2\cdot\alpha_{i-1,j}$ from $\mathcal{SC}^{\prime}$ , adds to $H_{i,j}$ a weighted hop from its center to the center of any supercluster $SC\in\mathcal{SC}_{i-1,j-1}$ with $\mbox{\rm c-dist}_{G}(C,SC)\leq r_{i-1,0}+r_{i-1,j-1}+2\cdot\alpha_{i-1,j}$ .

The parameter $\alpha_{i-1,j}$ is set in a way that guarantees that for every $(i-1,j)$ -unclustered vertex $u$ and every $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i-1,j})$ , the hopset $H_{i,j}$ contains a $3$ hop path $u$ - $v$ of length at most $k^{\epsilon}\cdot\alpha_{i-1,j}$ .

Let $\mathcal{SC}_{i-1,t}$ be the output superclustering after $t$ steps, then the output clustering $\mathcal{C}_{i}$ is formed by merging all clusters in the supercluster $SC_{\ell^{\prime}}$ to a single cluster in $\mathcal{C}_{i}$ for every $SC_{\ell^{\prime}}\in\mathcal{SC}_{i-1,t}$ . That is, $\mathcal{C}_{i}=\{\{V(SC_{\ell^{\prime}})\}~{}\mid~{}SC_{\ell^{\prime}}\in\mathcal{SC}_{i-1,t}\}$ . Let $H_{i}=\bigcup_{j=1}^{t}H_{i,j}$ . This completes the description of the $i^{th}$ phase. After $T$ phases, the output clustering $\mathcal{C}_{T}$ is shown to contain at most $n^{1-1/k^{\epsilon}}$ clusters in expectation. Let $H=\bigcup_{i=0}^{T}H_{i}$ be the current hopset after these phases.

Finalizing Stage: Spanner on Cluster Graph.

Given the collection of $O(n^{1-1/k^{\epsilon}})$ clusters in $\mathcal{C}_{T}$ , the algorithm first adds a weighted hop from each unclustered vertex $v\notin V(\mathcal{C}_{T})$ to the center of its closest cluster in $\mathcal{C}_{T}$ up to center-distance $r_{T}+d$ , if such exists. Next, a cluster graph $\widehat{G}=(\mathcal{C}_{T},\mathcal{E})$ is defined by connecting two clusters $C,C^{\prime}\in\mathcal{C}_{T}$ if their center-distance in $G$ is at most $2r_{T}+2d$ . That is, $\mathcal{E}=\{(C,C^{\prime})~{}\mid~{}C,C^{\prime}\in\mathcal{C}_{T}\mbox{~{}and~{}}\mbox{\rm dist}_{G}(C,C^{\prime})\leq 2r_{T}+2d\}$ . Let $\widehat{H}$ be a $k^{\prime}$ -spanner of $\widehat{G}$ for $k^{\prime}=2\cdot\lceil k^{\epsilon}\rceil-3$ . For edge $(C,C^{\prime})\in\widehat{H}$ , a weighted hop between the center of $C$ and the center of $C^{\prime}$ is added to the hopset $H$ . This completes the description of the algorithm.

The analysis of this algorithm is very similar to the analysis of the spanner construction from Section 3.

Stretch Analysis.

Recall that $r_{0}:=\frac{d}{R^{\prime}}$ , $T=\log_{\lceil k^{\epsilon}\rceil/4}(k^{1-2\epsilon}),t=\lceil k^{\epsilon}\rceil/4$ . For $1\leq i\leq T$ let $r_{i,0}=rad(\mathcal{C}_{i})$ and let $r_{T}:=rad(\mathcal{C}_{T})$ be the radius of the clusters in the last clustering $\mathcal{C}_{T}$ at the end of the middle stage. The stretch and size arguments are simil For the sake of the stretch and size analysis, we will need the following two claims, bounding $\alpha_{i,j}$ and $r_{i,0}$ , respectively. The next claim is the analog of Claim 2 in Section 3:

Claim 13.

For $i\in[T],j\in[t]$ , $\alpha_{i,j}\leq r_{i,0}\cdot\left((1+\frac{4}{\lceil k^{\epsilon}\rceil-3})^{j}-1\right)$ .

Proof.

We show by induction on $j$ that

[TABLE]

The base case, $j=1$ is trivial since $\alpha_{i,1}=\frac{4\cdot r_{i,0}}{\lceil k^{\epsilon}\rceil-3}$ . By Eq. (8) and the induction assumption,

[TABLE]

By Eq. (9) we then have:

[TABLE]

∎

We next turn to bound the radius of the clustering $\mathcal{C}_{i}$ for every $i\in[T]$ . The next claim is the analog of Claim 3 in Section 3:

Claim 14.

For each $0\leq i\leq T$ it holds that $r_{i,0}\leq(1.5\cdot\lceil k^{\epsilon}\rceil)^{i}\cdot r_{0}$ , thus $r_{T}\leq d/648$ .

Proof.

We show the correctness of the claim by induction on $i$ . The base case, for $i=0$ follows by the definition of $r_{i,0}$ . Assuming the claim holds up to $i$ we show the correctness for $i+1$ . Phase $i+1$ begins with clusters with radius at most $r_{i,0}$ , and terminates after $t=\lceil k^{\epsilon}\rceil/4$ superclustering steps of Procedure $\mathsf{ClusterAndAugmentHop}$ with clusters with radius $r_{i+1,0}$ . We therefore bound $r_{i+1,0}$ . At step $j$ of Proc. $\mathsf{ClusterAndAugmentHop}$ , we have superclusters of radius $r_{i,j-1}$ . The algorithm then connects clusters of radius $r_{i,0}$ , at distance at most $2\cdot\alpha_{i,j}$ from the sampled supercluster. In the $(i,j)$ step, the radius of the new supercluster is then increased by an additive factor of $2\cdot r_{i,0}+2\cdot\alpha_{i,j}$ . Combining this with the bound on $\alpha_{i,j}$ of Claim 2, we have:

[TABLE]

Thus plugging $t=\lceil k^{\epsilon}\rceil/4$ :

[TABLE]

The third inequality is valid when $k\geq 16^{1/\epsilon}$ . Finally, the final radius $r_{T}$ is bounded by:

[TABLE]

where the fourth inequality holds for $k\geq 16^{1/\epsilon}$ . ∎

Definition 3 (Clustered and Unlcustered Vertices).

A vertex $u\in V$ is* [math]-unclustered if $v\notin V(\mathcal{C}_{0})$ . A vertex $v$ is $(i,j)$ -unclustered if $v\in V(\mathcal{C}_{i,j-1})\setminus{V(\mathcal{C}_{i,j})}$ . A vertex $v$ is clustered if $v\in V(\mathcal{C})$ .*

By Claim 11 we have that $\mbox{\rm dist}_{G\cup H}^{(2)}(u,v)\leq(2\cdot\lceil k^{\epsilon}\rceil+1)\cdot\mbox{\rm dist}_{G}(u,v)$ for every [math]-unclustered vertex $u$ and any $v\in\mathsf{\textbf{B}}(u,r_{0}/(\lceil k^{\epsilon}\rceil+1))$ . We now consider the remaining vertices.

Claim 15.

For $0\leq i\leq T-1,1\leq j\leq t$ , for each $(i,j)$ -unclustered vertex $u\in V$ , $\mbox{\rm dist}_{H}^{(3)}(u,v)\leq\lceil k^{\epsilon}\rceil\cdot\alpha_{i,j}$ for any $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i,j})$ .

Proof.

Let $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i,j})$ . In the beginning of the $(i+1,j)$ step, the algorithm adds to $H$ hops from any vertex at center-distance at most $r_{i,j-1}+\alpha_{i,j}$ from $\mathcal{SC}_{i,j-1}$ to the center of its closest supercluster. In particular, since $\mbox{\rm c-dist}_{G}(v,\mathcal{SC}_{i,j-1})\leq\mbox{\rm dist}_{G}(v,u)+r_{i,j-1}\leq r_{i,j-1}+\alpha_{i,j}$ , we add a hop from $v$ to the center of some supercluster $SC_{v}\in\mathcal{SC}_{i,j-1}$ , and the weight of this hop is at most $r_{i,j-1}+\alpha_{i,j}$ . Since $u$ is unclustered in this step, the algorithm adds to $H$ a hop from the center of its cluster $C_{u}$ to the center of any supercluster which is in $\mathcal{SC}_{i,j-1}$ and is at center-distance at most $r_{i,0}+r_{i,j-1}+2\cdot\alpha_{i,j}$ from $C_{u}$ . Since $\mbox{\rm c-dist}_{G}(C_{u},SC_{v})\leq r_{i,0}+\mbox{\rm dist}_{G}(u,v)+\mbox{\rm c-dist}_{G}(v,SC_{v})\leq r_{i,0}+r_{i,j-1}+2\cdot\alpha_{i,j}$ , thus the algorithm adds a hop from $z_{u}$ (the center of $C_{u}$ ) to $z_{v}$ (the center of the supercluster $SC_{v}$ ) of weight at most

[TABLE]

and consequently, there is a $3$ -hop from $u$ to $v$ through $z_{u}$ and $z_{v}$ of length at most:

[TABLE]

where the second inequality follows from Eq. (5.2). We show by induction on $j$ that $(\lceil k^{\epsilon}\rceil-3)\alpha_{i,j}\geq 4\cdot j\cdot r_{i,0}+4\cdot\sum_{p=1}^{j-1}\alpha_{i,p}.$ The base case holds trivially. Assuming the correctness of the claim up to $j-1$ we have:

[TABLE]

We therefore have that,

[TABLE]

Figure 3, though illustrating claims regarding spanners, can be used to illustrate the proofs of Claims 15 and 16 as well. ∎

Claim 16.

Let $u,v\in V$ be vertices at distance $d$ in $G$ , and let $P$ be a shortest path between them. If there is some clustered vertex $w\in P$ , then $\mbox{\rm dist}_{G\cup H}^{(2\lceil k^{\epsilon}\rceil-1)}(u,v)\leq 4.5\cdot k^{\epsilon}\cdot d$ .

Proof.

Let $C_{w}$ be the cluster to which $w$ belongs. In the beginning of the third stage of the algorithm we add hops from unclustered vertices at center-distance at most $r_{T}+d$ to their closest cluster center. Thus since $\mbox{\rm c-dist}_{G}(u,\mathcal{C})\leq\mbox{\rm dist}_{G}(u,w)+r_{T}\leq r_{T}+d$ and $\mbox{\rm c-dist}_{G}(v,\mathcal{C})\leq\mbox{\rm dist}_{G}(v,w)+r_{T}\leq r_{T}+d$ , it holds that both $u,v$ have hops to centers of some clusters $C_{u},C_{v}\in\mathcal{C}$ . Since

[TABLE]

it holds that $\mbox{\rm dist}_{\widehat{G}}(C_{u},C_{v})\leq 1$ , thus $\mbox{\rm dist}_{\widehat{H}}(C_{u},C_{v})\leq 2\cdot\lceil k^{\epsilon}\rceil-3$ . Since each edge $(C,C^{\prime})\in E(\widehat{H})$ translates into a hop in $H$ between the centers of $C_{u},C_{v}$ of weight at most $2\cdot r_{T}+2d$ , we have the following,

[TABLE]

where the last inequality follows by plugging the bound on the final radius $r_{T}$ from Claim 14. See Figure LABEL:fig:SecondHopsetAnalysis(b) for an illustration. ∎

Proof of Theorem 6.

Stretch and Hop-Bound. Let $u,v\in V$ be vertices at distance $d^{\prime}\in[d/2,d]$ in $G$ , and let $P$ be some shortest path between them in $G$ . First observe that if there is some clustered vertex $w\in P$ the claim follows from Claim 16. Assume from now on that there is no such vertex. Partition the path $P$ into consecutive segments in the following way: denote $v_{0}:=u$ and inductively, given $v_{l}\neq v$ define $v^{\prime}_{l}$ to be the furthest vertex on $P[v_{l},v]$ at distance at most $\Delta_{l}$ from $v_{l}$ , where:

[TABLE]

Note that $v^{\prime}_{l}$ might be equal to $v_{l}$ if the incident edge to $v_{l}$ on $P[v_{l},v]$ has weight larger than $\Delta_{l}$ . In addition, for each $v^{\prime}_{l}\neq v$ let $v_{l+1}$ be the consecutive neighbor of $v^{\prime}_{l}$ on $P[v^{\prime}_{l},v]$ . When $v^{\prime}_{l}=v$ , simply let $v_{l+1}=v^{\prime}_{l}$ . Let $\ell$ be the minimal index such that $v_{\ell}=v$ . This defines a partition of $P$ to $\ell$ segments by setting for all $1\leq i\leq\ell$ , $P_{i}=[v_{i-1},v_{i}]$ and $P_{\ell}=[v_{\ell-1},v]$ is the last segment that reaches $v$ . Note that for every segment $P_{l+1}$ (except at most the last one) $\mbox{\rm dist}_{G}(v_{l},v_{l+1})\geq\frac{r_{0}}{(\left\lceil k^{\epsilon}\right\rceil+1)}$ . If $v_{l}$ is [math]-unclustered this is clear, and if $v_{l}$ is $(i,j)$ -unclustered, then by Eq. (8) it holds that $\mbox{\rm dist}_{G}(v_{l},v_{l+1})\geq\alpha_{i,j}\geq\frac{r_{i-1,0}}{\lceil k^{\epsilon}\rceil-3}>\frac{r_{0}}{(\left\lceil k^{\epsilon}\right\rceil+1)}$ . Thus we have that $\ell\leq(\left\lceil k^{\epsilon}\right\rceil+1)\cdot R^{\prime}<36^{1/\epsilon}\cdot k^{1-\epsilon}$ . For any $l\in\{0,\ldots,\ell-1\}$ , if $v_{l}$ is [math]-unclustered, by Claim 11, it holds that:

[TABLE]

If $v_{l}$ is $(i,j)$ -unclustered then since $\mbox{\rm dist}_{G}(v_{l},v^{\prime}_{l})\leq\alpha_{i,j}$ , by Claim 15, it holds that $\mbox{\rm dist}^{(3)}_{G\cup H}(v_{l},v^{\prime}_{l})\leq\lceil k^{\epsilon}\rceil\cdot\alpha_{i,j}$ . There are two cases to consider. First assume that $\mbox{\rm dist}_{G}(v_{l},v_{l+1})\geq\alpha_{i,j}$ . In such a case $\mbox{\rm dist}^{(4)}_{G\cup H}(v_{l},v_{l+1})\leq\lceil k^{\epsilon}\rceil\cdot\mbox{\rm dist}_{G}(v_{l},v_{l+1})$ . Next, assume that $\mbox{\rm dist}_{G}(v_{l},v_{l+1})<\alpha_{i,j}$ . By the definition of the segment, in such a case it must hold that $v^{\prime}_{l}=v_{l+1}=v_{\ell}=v$ . Thus by proof of Claim 15 and Eq. (5.2),

[TABLE]

Therefore by summing over all $\ell$ segments, we get that

[TABLE]

Size Analysis.

By Fact 2(iii), it holds that $|E(H_{0})|=O(k^{\epsilon}\cdot n^{1+1/k})$ . In the same manner as shown in Claim 6 it holds that for any $1\leq i\leq T,1\leq j\leq\lceil k^{\epsilon}\rceil/4$ , in step $(i,j)$ we add $O(n)$ hops. It follows that for all $1\leq i\leq T$ , in expectation $|H_{i}|=O(k^{\epsilon}\cdot n)$ . Since there are $T=\log_{\lceil k^{\epsilon}\rceil/4}k^{1-2\epsilon}=O(\frac{1}{\epsilon})$ phases we have $\sum_{i=1}^{T}|E(H_{i})|=O(\frac{k^{\epsilon}}{\epsilon}\cdot n)$ hops in total. As shown in Claim 7, $|\mathcal{C}_{T}|=O(n^{1-(\lceil k^{\epsilon}\rceil/4)^{T}\cdot\frac{\left\lceil k^{\epsilon}\right\rceil}{k}})=O(n^{1-\frac{1}{k^{\epsilon}}})$ in expectation. Consequently, $|E(\widehat{H})|=O(|\mathcal{C}_{T}|^{1+\frac{1}{\lceil k^{\epsilon}\rceil-1}})=O(n)$ in expectation. Since each edge in $\widehat{H}$ translates into a single hop, this step contributes $O(n)$ hops. Overall, $|E(H)|=O(k^{\epsilon}\cdot n^{1+1/k}+\frac{k^{\epsilon}}{\epsilon}\cdot n)$ , in expectation. ∎ Theorem 6 follows by applying the algorithm for each of the $\log\Lambda$ distance classes.

5.3 New $(3+\epsilon,\beta)$ Hopset

In this subsection we show a considerably simplified construction of $(3+\epsilon,\beta)$ hopsets. For example for $\epsilon=1$ , we get a $(4,O(k^{\log{23}}))$ hopset. For the sake of the efficient implementation in Sec. 6.2, we settle for a slightly worse value of $\beta$ . In Appendix B.2, we show an improved construction that achieves the bounds of Lemma 2. Our main result in this section is as follows:

Lemma 8.

For any $n$ -vertex weighted graph $G=(V,E,w)$ , integer $k$ and $\epsilon>0$ , one can compute a $(3+\epsilon,\beta)$ hopset $H$ where $\beta=16\cdot(5+18/\epsilon)\cdot k^{\log(5+18/\epsilon)}$ of expected size $|E(H)|=O((n^{1+1/k}+\log{k}\cdot n)\log\Lambda)$ .

Algorithm Description.

For simplicity we fix a distance range $[d,2d]$ . Lemma 8 follows by taking care of all $\log\Lambda$ ranges. Furthermore, for simplicity we assume throughout that $4/\epsilon$ is an integer. The algorithm has two stages. In the first stage it calls Procedure $\mathsf{TruncatedTZ}$ (from Sec. 5.1) for a single iteration with a radius parameter

[TABLE]

For completeness, and due to its simplicity we add a complete description of this single iteration. The procedure samples each vertex $v\in V$ into a subset $A_{1}\subseteq{V}$ with probability $n^{-1/k}$ . For each $v\in V$ let $p(v)$ its closest vertex in $A_{1}$ . For each sampled vertex $a\in A_{1}$ define its cluster

[TABLE]

The [math]-level clustering is given by $\mathcal{C}_{0}=\{C(a)~{}\mid~{}a\in A_{1}\}$ . Finally, add to the hopset $H_{0}$ , the hops $(v,p(v))$ for every $v\in V$ . In addition, for every unclustered vertex $v\in V\setminus V(\mathcal{C}_{0})$ add to $H_{0}$ , the hop to each vertex $u$ satisfying that $\mbox{\rm dist}_{G}(v,u)\leq\mbox{\rm dist}_{G}(v,p(v))$ . The procedure outputs the clustering $\mathcal{C}_{0}$ with $n^{1-1/k}$ clusters, and a partial hopset $H_{0}$ of size $n^{1+1/k}$ , in expectation.

The second stage has $T=\lceil\log k\rceil$ clustering phases. Starting with $\mathcal{C}_{0}$ , in each phase $i\geq 1$ , given is a clustering $\mathcal{C}_{i-1}$ of expected size $n_{i-1}=n^{1-\frac{2^{i-1}}{k}}$ and with radius at most

[TABLE]

The output of the $i^{th}$ phase is a clustering $\mathcal{C}_{i}$ of expected size $n_{i}=n^{1-\frac{2^{i}}{k}}$ , and a hopset $H_{i}$ that takes care of the unclustered vertices in $V(\mathcal{C}_{i-1})\setminus V(\mathcal{C}_{i})$ .

We now zoom into $i^{th}$ phase and explain the construction of the clustering $\mathcal{C}_{i}$ and the hopset $H_{i}$ . The phase is governed by two key parameters: the sampling probability of each cluster $p_{i}$ and the augmentation radius $\alpha_{i}$ . Similarly to the algorithm of Section 4, for every $i\geq 1$ , define

[TABLE]

The description of the $i^{th}$ phase for every $i\in\{1,\ldots,T\}$ is as follows:

Each unclustered vertex $v\in V\setminus{V(\mathcal{C}_{i-1})}$ with $\mbox{\rm c-dist}_{G}(v,\mathcal{C}_{i-1})\leq r_{i-1}+\alpha_{i}$ adds to $H_{i}$ a hop to its closest cluster center in $\mathcal{C}_{i-1}$ . 2. 2.

Let $\mathcal{C}^{\prime}\subseteq\mathcal{C}_{i-1}$ be the collection of clusters obtained by sampling each cluster $C_{\ell}\in\mathcal{C}_{i-1}$ independently with probability of $p_{i}$ . 3. 3.

Each cluster $C\in\mathcal{C}_{i-1}$ such that $\mbox{\rm c-dist}_{G}(C,\mathcal{C}^{\prime})\leq 4\cdot r_{i-1}+4\cdot\alpha_{i}$ , joins the its closest cluster in $\mathcal{C}^{\prime}$ (based on the c-dist measure). All the vertices in $C$ add to $H_{i}$ , a hop to the center of their new cluster. 4. 4.

The clustering $\mathcal{C}_{i}$ consists of all sampled clusters in $\mathcal{C}^{\prime}$ augmented by their nearby clusters in $\mathcal{C}_{i-1}$ (i.e., at center-distance at most $4\cdot r_{i-1}+4\cdot\alpha_{i}$ ). 5. 5.

The center of each cluster $C\in\mathcal{C}_{i-1}$ such that $\mbox{\rm c-dist}_{G}(C,\mathcal{C}^{\prime})>4\cdot r_{i-1}+4\cdot\alpha_{i}$ , adds to $H_{i}$ a hop to the center of any cluster $C^{\prime}\in\mathcal{C}_{i-1}$ with $\mbox{\rm c-dist}_{G}(C,C^{\prime})\leq 2\cdot r_{i-1}+2\cdot\alpha_{i}$ .

This completes the description of the $i^{th}$ phase. Let $H=\bigcup_{i=0}^{T}H_{i}$ and let $\mathcal{C}_{T}$ be the output clustering of the last phase $T$ . In the analysis section we show that $\mathcal{C}_{T}$ contains at most a one cluster $C$ , in expectation . We then add the output hopset $H$ , a hop from each vertex at center-distance at most $r_{T}+2d$ from $C$ to the center of the cluster121212For the sake of the efficient implementation of Sec. 6.2, we connect only vertices up to center-distance $r_{T}+2d$ to the center of the last cluster, even-though with respect to the size of the hopset, we can afford ourselves to connect the center of $C$ to all nodes..

Stretch Analysis.

For the sake of the stretch analysis, we use the following definition:

Definition 4.

A vertex $v\in V$ is [math]-unclustered if $v\notin V(\mathcal{C}_{0})$ . For any $i\geq 1$ , a vertex $v\in V$ is $i$ -unclustered if it is in $V(\mathcal{C}_{i-1})\setminus{V(\mathcal{C}_{i})}$ . Finally, a vertex $v\in V$ is clustered if it is in the last cluster.

We first make a simple observation:

Observation 4.

For every $0\in\{1,\ldots,T\}$ , $r_{i}\leq(5+16/\epsilon)^{i}\cdot r_{0}$ . In particular, the radius of cluster in the final clustering $\mathcal{C}_{T}$ is $r_{T}\leq d/2$ .

Proof.

The claim is shown by induction on $i$ . The base case $i=0$ is trivial. Assume that the claim holds up to $i-1\geq 0$ , and consider the $i^{th}$ where the clustering $\mathcal{C}_{i}$ is defined based on the clustering $\mathcal{C}_{i-1}$ . Each cluster in $\mathcal{C}_{i}$ is formed by a star: the head of the star is the sampled cluster $C$ , connected to other clusters $C^{\prime}\in\mathcal{C}_{i-1}$ with center-distance at most $4\cdot r_{i-1}+4\alpha_{i}$ . The radius of this star of clusters is bounded by:

[TABLE]

where the second inequality follows by plugging the bound on $r_{i-1}$ obtained from the induction assumption, and the bound on $\alpha_{i}$ from Eq. (11). ∎

Observation 5.

For $0\leq i\leq\lceil\log{k}\rceil$ , in expectation $|\mathcal{C}_{i}|=n^{1-\frac{2^{i}}{k}}$ therefore after $T=\lceil\log{k}\rceil$ phases, there is at most one cluster in $\mathcal{C}_{T}$ , in expectation.

Proof.

By induction on $i$ . For the base case consider $i=0$ . In the first stage the algorithm samples each vertex with probability $n^{-1/k}$ , thus we have $n^{1-1/k}$ clusters in expectation. Assuming that claim holds up to $i-1$ , in phase $i$ , each cluster in $\mathcal{C}_{i-1}$ is sampled independently with probability $|\mathcal{C}_{i-1}|/n$ . Thus, $|\mathcal{C}_{i}|=|\mathcal{C}_{i-1}|^{2}/n=n^{1-\frac{2^{i}}{k}}$ in expectation. ∎

Claim 17 ([math]-unclustered).

For any [math]-unclustered vertex $u\in V$ , it holds that $\mbox{\rm dist}_{H}^{(1)}(u,v)=\mbox{\rm dist}_{H}(u,v)$ for all $v\in\mathsf{\textbf{B}}_{G}(u,r_{0})$

Proof.

If $u$ is an [math]-unclustered vertex, then $A_{1}\cap\mathsf{\textbf{B}}_{G}(u,r_{0})=\emptyset$ . Since we add hops from $u$ to any vertex at distance at most $\mbox{\rm dist}_{G}(u,p(u))>r_{0}$ , it holds that $H$ contains a hop to any $v\in\mathsf{\textbf{B}}_{G}(u,r_{0})$ . ∎

Claim 18 ( $i$ -unclustered).

For any $i$ -unclustered vertex $u$ for $i\geq 1$ and every $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i})$ , it holds that $\mbox{\rm dist}_{H}^{(3)}(u,v)\leq 3\cdot\mbox{\rm dist}_{G}(u,v)+4\cdot r_{i-1}$ .

Proof.

Consider an $i$ -unclustered vertex $u$ and $v\in\mathsf{\textbf{B}}_{G}(u,\alpha_{i})$ . Let $C_{u}\in\mathcal{C}_{i-1}$ be the cluster of $u$ . Thus $\mbox{\rm c-dist}_{G}(v,\mathcal{C}_{i-1})\leq r_{i-1}+\mbox{\rm dist}_{G}(v,u)$ . Let $C_{v}$ be the cluster with the closest center to $v$ in $\mathcal{C}_{i-1}$ , then the algorithm adds to $H_{i}$ a hop from $v$ to the center of $C_{v}$ . We have that:

[TABLE]

Since $u$ becomes unclustered in phase $i$ , the algorithm adds to $H_{i}$ a hop from the center of $C_{u}$ , to the center of $C_{v}$ . Overall, we have the following $3$ -hop $u$ - $v$ path: $(u,r(C_{u}))\circ(r(C_{u}),r(C_{v}))\circ(r(C_{v}),v)$ , where $r(C_{u}),r(C_{v})$ are the centers of $C_{u},C_{v}$ respectively. The length of this path is bounded by:

[TABLE]

∎

Claim 19.

Let $u,v\in V$ be a pair of vertices at distance $[d,2d]$ in $G$ , and let $P_{uv}$ be their shortest path in $G$ . If there is a clustered vertex $w\in P_{uv}$ then $\mbox{\rm dist}_{H}^{(2)}(u,v)\leq\mbox{\rm dist}_{G}(u,v)+2\cdot r_{T}\leq 3\cdot d$ .

Proof.

Since $w$ is clustered, it follows that in the last step we add hops from $u$ and $v$ to the center $z$ of the last cluster, therefore:

[TABLE]

where the last inequality follows by Observation 4. ∎

We are now ready to complete the stretch argument and show Lemma 8.

Lemma 8.

Let $u,v\in V$ be vertices at distance $d^{\prime}\in[d,2d]$ in $G$ , and let $P$ be the shortest path between them in $G$ . First observe that by Claim 19, if there is some clustered vertex in $P$ then we are done, thus we assume there is none. We define a sequence of vertices between $u$ and $v$ in the following way: let $v_{0}:=u$ , and iteratively, given $v_{j}\neq v$ , set $v^{\prime}_{j}$ as the furthest vertex from $v_{j}$ on the segment $P[v_{j},v]$ , at distance at most $\Delta_{j}$ from $v_{j}$ , where:

[TABLE]

Furthermore, for each $v^{\prime}_{j}$ , set $v_{j+1}$ to be the be the next vertex on $P[v^{\prime}_{j},v]$ , or just $v$ in case $v^{\prime}_{j}=v$ . By Claim 17, if $v_{j}$ is [math]-unclustered then $\mbox{\rm dist}_{G\cup H}^{(2)}(v_{j},v_{j+1})=\mbox{\rm dist}_{G}(v_{j},v_{j+1})$ . If $v_{j}$ is $i$ -unclustered for $\geq 1$ , then there are two cases. First consider the case where $\mbox{\rm dist}_{G}(v_{j},v_{j+1})\geq\alpha_{i}$ . In this case by Claim 18, it holds that $\mbox{\rm dist}_{H}^{(3)}(v_{j},v^{\prime}_{j})\leq 3\cdot\mbox{\rm dist}_{G}(v_{j},v^{\prime}_{j})+4\cdot r_{i-1}$ , thus since $\mbox{\rm dist}_{G}(v_{j},v_{j+1})>\alpha_{i}$ , we have that,

[TABLE]

Now, assume that $\mbox{\rm dist}_{G}(v_{j},v_{j+1})<\alpha_{i}$ , that is $v_{j+1}=v$ . By Claim 18 it holds that

$\mbox{\rm dist}_{H}^{(3)}(v_{j},v)\leq 3\cdot\mbox{\rm dist}_{G}(v_{j},v)+4\cdot r_{i-1}$ . By Observation 4, $r_{i-1}\leq r_{T-1}\leq\frac{d}{2\cdot(5+16/\epsilon)}$ , an therefore, we get that

[TABLE]

Since the path $P$ is partitioned into at most $\ell=4\cdot R^{\prime}$ segments, we have:

[TABLE]

∎

Lemma 8 follows by using $\epsilon^{\prime}=\frac{8}{9}\cdot\epsilon$ , and repeating the algorithm for each of the $\log\Lambda$ distance ranges.

Size Analysis.

We next bound the total number of hops added to the hopset. The first step contributes $O(n^{1+1/k})$ edges, in expectation. This follows by Fact 2(iii). At each phase $i\geq 1$ the algorithm adds the following hops to $H_{i}$ . Each vertex at center-distance at most $2\cdot r_{i-1}+\alpha_{i}$ adds a hop to its closest cluster center in $\mathcal{C}_{i-1}$ . This adds at most $n$ hops. In addition, in step (3), we add at most $n$ hops from each vertex to its new cluster center.

By a similar argument to the proof of Lemma 6, it follows that in step (5) of the $i^{th}$ phase, the algorithm adds $1/p_{i}$ hops for each cluster $C\in\mathcal{C}_{i-1}$ in expectation, where $p_{i}$ are defined in Eq. 11. Summing over all $|\mathcal{C}_{i-1}|$ clusters, this adds $|\mathcal{C}_{i-1}|/p_{i}=n$ hops by Eq. (11), in expectation. Overall in phase $i$ , $O(n)$ edges are added to the hopset $H_{i}$ , and summing over all $T$ phases gives a total of $O(\log{k}\cdot n)$ edges in expectation. Finally, the last step adds at most $n-1$ edges, this completes proof of Lemma 8.

6 Efficient Computation of Spanners, Hopsets, and Applications

6.1 Efficient Constructions of $(3+\epsilon,\beta)$ Spanners and Applications

In this section, we provide efficient implementation of $(3+\epsilon,\beta)$ spanners in various computational settings, and show their applications to shortest path computation.

A Modified Meta-Algorithm.

The algorithm is similar to the algorithm of Section 4 up to modifying the number of phases and the sampling probability of Eq. (11). As in [EN19a], we introduce an efficiency parameter $\rho\in(0,1]$ that determines the trade-off between the $\beta$ value, the number of edges in the spanner, and the construction time. For a given parameter $\rho$ , define:

[TABLE]

The modified algorithm applies $T^{\prime}:=i_{0}+i_{1}$ phases instead of $T=\lceil\log{k}+1\rceil$ phases. The sampling probabilities are modified as follows: for every $1\leq i\leq i_{0}$ , let $p_{i}$ be as defined in Eq. (11), and for every $i_{0}\leq i\leq i_{1}$ , set $p_{i}=p_{i_{0}}$ . We first show the correctness of this modified algorithm and then analyze its implementation in several computational settings.

Observation 6.

For all $1\leq i\leq T^{\prime}$ it holds that $p_{i}\geq n^{-\rho/2}$ .

Proof.

For each $i\leq i_{0}$ , by Eq. (11) we have $p_{i}=|\mathcal{C}_{i-1}|/n$ , thus for all $1\leq i\leq T^{\prime}$ it holds that $p_{i}\geq|\mathcal{C}_{i_{0}-1}|/n$ . By Observation 3 it holds that in expectation $|\mathcal{C}_{i_{0}}|=n^{1-2^{i_{0}-1}/k}\geq n^{1-2^{\log{(k\cdot\rho)}-1}/k}=n^{1-\rho/2}$ , thus $p_{i}\geq n^{-\rho/2}$ . ∎

Lemma 9.

In each phase $i\in\{1,\ldots,T^{\prime}\}$ consider the collection of BFS trees of depth $2\cdot r_{i-1}+2\cdot\alpha_{i}$ rooted at the centers of the clusters of $\mathcal{C}_{i-1}$ that are unclustered in $\mathcal{C}_{i}$ . Then each vertex $v\in V$ appears in $O(n^{\rho/2}\cdot\log n)$ such trees w.h.p.

Proof.

Fix a vertex $v\in V$ , and let $C_{1},\ldots,C_{\ell}$ be the clusters in $\mathcal{C}_{i-1}$ with center-distance at most $2\cdot r_{i-1}+2\cdot\alpha_{i-1}$ from $v$ . Note that $\mbox{\rm dist}_{G}(r(C_{j}),r(C_{j^{\prime}})\leq 4\cdot r_{i-1}+4\cdot\alpha_{i-1}$ for every $j,j^{\prime}\in\{1,\ldots,\ell\}$ where $r(C)$ is the center of the cluster $C$ . This implies that if one of these clusters in sampled, the all these clusters will be part of the clustering $\mathcal{C}_{i}$ and would not be part of step (5) in this $i$ phase. Therefore all clusters the expected number of BFS traversals that reach $v$ in step (5) of the phase $i$ is at most $\ell\cdot(1-p_{i})^{\ell}\leq 1/p_{i}$ . Since by Observation 6 for all $i$ , $1/p_{i}\leq n^{\rho/2}$ , we have that each vertex is traversed by $O(n^{\rho/2})$ BFS traversals in expectation, and by the Chernoff Bound, w.h.p, each vertex is traversed by at most $O(\log{n}\cdot n^{\rho/2})$ traversals. ∎

Lemma 10.

After $T^{\prime}$ phases, the number of remaining clusters $\mathcal{C}_{T^{\prime}}$ is at most $O(\log n)$ w.h.p.

Proof.

Until the $i_{0}^{th}$ phase the algorithm runs similarly to the algorithm in Section 4, thus by Observation 3, in expectation $|\mathcal{C}_{i_{0}}|=n^{1-2^{i_{0}-1}/k}$ . In each of the remaining $i_{1}$ phases we sample with probability $p_{i_{0}}$ which by Observation 6 is at least $n^{-\rho/2}$ , thus after $i_{1}$ such sampling steps, we will have $|\mathcal{C}_{T^{\prime}}|=n^{1-2^{(i_{0}-1)}/k-(\rho\cdot i_{1})/2}\leq n^{1-(\rho/2)(1+i_{1})}\leq 1$ in expectation. By Chernoff it follows that w.h.p $|\mathcal{C}_{T^{\prime}}|=O(\log{n})$ . ∎

Observation 7.

The modified algorithm computes a $(3+\epsilon,\beta)$ spanner $H\subseteq G$ with $\beta=(5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)}$ and $O(n^{1+1/k}+\beta\cdot n)$ edges w.h.p.

Proof.

The final radius after $T^{\prime}$ phases is given by plugging $T^{\prime}$ in Observation 2:

[TABLE]

Since we only modified the sampling probabilities of the spanner in 4, it follows that the stretch arguments are unchanged, except that the final radius $r_{T^{\prime}}$ is larger . Thus by Claim 10, we get that for every $u,v\in V$ it holds that $\mbox{\rm dist}_{H}(u,v)\leq(3+\epsilon)\cdot\mbox{\rm dist}_{G}(u,v)+2\cdot r_{T^{\prime}}$ .

We now bound the size of the spanner. The first phase, adds $n^{1+1/k}$ edges, in expectation, and in any subsequent phase, the algorithm adds $O(n)$ shortest paths in expectation, each of length at most $r_{i}$ . Thus overall, this adds $O(n\cdot\sum_{i=1}^{T^{\prime}}r_{i})=O(n\cdot\beta)$ edges. The last step adds $O(n)$ edges due to adding a constant number of BFS trees. ∎

6.1.1 The Centralized Setting

The trade-off between the $\beta$ value, the spanner size and the running time of the algorithm is summarized below:

Lemma 11.

For any graph $G=(V,E)$ , integer $k\geq 1$ and any $\epsilon>0,1\geq\rho>0$ , one can compute a $(3+\epsilon,\beta)$ spanner $H\subseteq G$ for $\beta=O((5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)})$ with $O(n^{1+1/k}+(5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)}\cdot n)$ edges and $O((\log{(k\cdot\rho)}+1/\rho)\cdot|E|\cdot n^{\rho})$ time.

Proof.

The algorithm has $T^{\prime}=\lceil\log{(k\cdot\rho)}\rceil+\lceil 2/\rho-1\rceil\leq\log(k\cdot\rho)+2/\rho+1$ phases. In each phase there are five steps that are implemented as follows. In step (1), we add shortest paths of length $r_{i-1}+\alpha_{i}$ from unclustered vertices to their closest center. Since each vertex connected to at most its closest center, while breaking ties in a consistent manner this can be done in $O(|E|)$ time. In the same manner, also step (3) can be implemented in $O(|E|)$ time. Finally, we consider the fifth phase where we grow BFS trees from all centers that did not join the clustering $\mathcal{C}_{i}$ . By Lemma 9, each vertex appears in at most $O(n^{\rho}\log n)$ trees, and thus these trees can be computed in $O(n^{\rho}|E|)$ time. The overall running time is then bounded by $O(n^{\rho}|E|\cdot T^{\prime})$ as desired. ∎

By computing efficiently the $(3+\epsilon,\beta)$ spanners, we also get the following fast computation of the $S\times V$ distances. The next is Corollary 19 of [EN19a] while enjoying a better tradeoff in the expense of increasing the multiplicative stretch from $(1+\epsilon)$ to $(3+\epsilon)$ :

Corollary 1.

There exists an algorithm that computes for any graph $G=(V,E)$ , integer $k\geq 1$ , any parameters $\epsilon>0,\rho\in(0,1)$ and any vertex set $S\subseteq{V}$ , a $(3+\epsilon,\beta)$ approximate shortest paths for $S\times V$ , for $\beta=O((5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)})$ in $O((\log{(k\cdot\rho)}+1/\rho)\cdot|E|\cdot n^{\rho}+|S|\cdot(n^{1+1/k}+(5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)}\cdot n)$ time.

6.1.2 The Distributed Setting

The ${\mathsf{LOCAL}}$ Model.

We now consider the implementation details of our spanner construction in the standard ${\mathsf{LOCAL}}$ model [Pel00]. In this model, the algorithm’s execution proceeds in synchronous rounds, and in every round, each node can send a message (possibly of unbounded size) to each of its neighbors. Each node holds a processor with a unique and arbitrary ID of $O(\log n)$ bits.

One of the key effects of improving the value of the $\beta$ in our spanner is that we can compute our $(4+\epsilon,\beta)$ spanner in $O(\beta)$ rounds, hence for $\epsilon=1$ , in $O(k^{\log 21})$ rounds. This should be compared against the local computation of $(1+\epsilon,\beta)$ spanners in $O_{\epsilon}(\log k)^{\log k}$ rounds.

Lemma 12.

For any graph $G=(V,E)$ , integer $k$ and $\epsilon>0$ , one can compute in the ${\mathsf{LOCAL}}$ model a $(4+\epsilon,\beta)$ spanner $H\subseteq G$ for $\beta=O((5+16/\epsilon)\cdot k^{\log(5+16/\epsilon)})$ in $\widetilde{O}(\beta)$ rounds w.h.p.

The ${\mathsf{LOCAL}}$ implementation is exactly as in Section 4 with two modifications. First, we will now make all the arguments hold with high probability of $1-1/n^{c}$ for some constant $c$ , rather than in expectation. Specifically, in last step of the algorithm there are now $O(\log n)$ clusters in $\mathcal{C}_{T}$ , w.h.p. Instead of adding a BFS tree w.r.t to each center, we will add a truncated BFS tree up to depth $5\cdot r_{T}$ from each of the $O(\log n)$ centers. We now show that this slightly increases the stretch of the spanner, by proving the analogue of Cl. 10:

Claim 20.

Fix a pair $u,v\in V$ and let $P$ be their shortest path in $G$ . If there is a clustered vertex $w\in V(P)$ , then:

[TABLE]

Proof.

First assume that $\mbox{\rm dist}_{G}(u,v)\leq 4\cdot r_{T}$ . Let $w$ be some clustered vertex on $P$ , and let $s$ be the center of the cluster to which $w$ belongs. In this case since $\mbox{\rm dist}_{G}(w,s)\leq r_{T}$ , it holds that $\mbox{\rm dist}_{G}(u,s)\leq 5\cdot r_{T},\mbox{\rm dist}_{G}(v,s)\leq 5\cdot r_{T}$ , hence $\mbox{\rm dist}_{H}(u,s)=\mbox{\rm dist}_{G}(u,s)$ and $\mbox{\rm dist}_{H}(v,s)=\mbox{\rm dist}_{G}(v,s)$ . Consequently, $\mbox{\rm dist}_{H}(u,v)\leq\mbox{\rm dist}_{G}(u,s)+\mbox{\rm dist}_{G}(s,v)\leq\mbox{\rm dist}_{G}(u,v)+2\cdot r_{T}$ . Since we changed only the last step, it implies that for any $1\leq d\leq 4\cdot r_{T}$ , we have that for vertices $u,v\in V$ at distance $d$ in $G$ , $\mbox{\rm dist}_{H}(u,v)\leq(3+\epsilon)\cdot\mbox{\rm dist}_{G}(u,v)+4\cdot r_{T}$ . For $u,v\in V$ with $\mbox{\rm dist}_{G}(u,v)>4\cdot r_{T}$ it holds that:

[TABLE]

∎

As all the steps of the algorithms are now restricted to the $O(\beta)$ -ball of each vertex, therefore Lemma 12 follows.

The ${\mathsf{CONGEST}}$ Model.

We next consider the implementation details of our spanner construction in the standard ${\mathsf{CONGEST}}$ model [Pel00]. This model is exactly as the ${\mathsf{LOCAL}}$ only that in each round, a vertex is limited to send $O(\log n)$ bits on each of its incident edges.

The implementation in the ${\mathsf{CONGEST}}$ model follow the same line of the meta-algorithm, only that in the last step we build a truncted BFS tree up to depth $5\cdot r_{T^{\prime}}$ as in the local implementation. We have:

Lemma 13.

For any graph $G=(V,E)$ , integer $k$ and any $\epsilon>0,1\geq\rho>0$ , one can compute in the ${\mathsf{CONGEST}}$ model a $(4+\epsilon,\beta)$ spanner $H\subseteq G$ for $\beta=O((5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)})$ in $\widetilde{O}(n^{\rho}\cdot\beta)$ rounds w.h.p.

Proof.

There are $T^{\prime}$ phases. We will fix phase $i$ and show it can implemented in $\widetilde{O}(n^{\rho}+r_{i})$ rounds. Steps (1) and (3) are based on congestion-free BFS computation up to depth $O(r_{i})$ . Step (5) builds a collection of BFS trees up to depth $O(r_{i})$ . By Lemma 9, each vertex is traversed by $O(n^{\rho}\log n)$ trees. Computing a collection of BFS trees up to depth $r_{i}$ with edge congestion $O(n^{\rho})$ can be done in $\widetilde{O}(n^{\rho}+r_{i})$ rounds w.h.p using the random delay approach. Therefore by summing over all $T^{\prime}$ phases we get $\widetilde{O}(n^{\rho}\cdot\beta)$ rounds.

Finally, in the last step we have $O(\log n)$ centers in $\mathcal{C}_{T^{\prime}}$ w.h.p. Computing a depth $O(r_{T^{\prime}})$ -trees from each center can be done in $\widetilde{O}(r_{T^{\prime}})$ rounds. The time analysis follows. ∎

6.1.3 The Multi-Pass Streaming Setting

Model.

In the streaming model the input graph is presented to the algorithm edge by edge as a stream without repetitions and the goal is to solve the problem while minimizing the number of passes and space. For graph algorithms, the usual assumption is that the edges of the input graph are presented to the algorithm in arbitrary order. The next is Corollary 20 of [EN19a] while enjoying a better tradeoff in the expense of increasing the multiplicative stretch from $(1+\epsilon)$ to $(4+\epsilon)$ :

Lemma 14.

For any $n$ -vertex unweighted graph $G=(V,E)$ , integer $k$ and and $\epsilon>0,1\geq\rho>0$ , one can compute in the multi-pass streaming model a $(4+\epsilon,\beta)$ spanner $H\subseteq G$ for $\beta=O((5+16/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+16/\epsilon)})$ in $O(\log{n}\cdot n^{1+\rho})$ space w.h.p and $O(\beta)$ passes, or with $O(n^{1+1/k}+(\beta+\log{n})\cdot n)$ space in expectation and $O(\log{n}\cdot n^{\rho}/\rho\cdot\beta)$ passes w.h.p.

Proof.

We will use two alternative implementations in the streaming model, in a very similar way to Theorem 5 in [EN19a]. In both implementations, for the BFS traversals the algorithm keeps for each traversed vertex the ID of its parent, the ID and of the root of the BFS tree, and it distance to the the root. The first implementation is very similar to the implementation that we described for the ${\mathsf{CONGEST}}$ model. In this implementation, we compute only truncated BFS trees up to depth $5\cdot r_{T}$ , rather then a complete BSF tree. A truncated BFS traversal up to depth $r_{i}$ can be implemented in $r_{i}$ passes, thus overall the algorithm can be implemented in $O(\beta)$ passes. Since each vertex is visited by $O(\log{n}\cdot n^{\rho})$ BFS trees w.h.p, the total space used is bounded by $O(\log{n}\cdot n^{1+\rho})$ space. We now consider the alternative implementation. To reduce the BFS congestion of $n^{\rho}$ in the fifth step, this step is divided into $\tau=c\cdot\log{n}/p_{i}$ sub-steps. In each sub-step, we will sample each of the remaining centers (from which we would like to compute the BFS traversal) independently with probability of $p_{i}$ . We will then compute the truncated BFS traversal only from the centers that got sampled in this sub-step. By the Chernoff bound, w.h.p., each vertex will be visited by $O(\log n)$ traversal, hence a space of $O(n\log n)$ plus the space of the spanner is sufficient for the implementation. After $\tau$ sub-steps, w.h.p., the algorithm has computed the truncated BFS-traversal from each of the cluster centers. The total number of passes is bounded by $O(n^{\rho}/\rho\cdot\log{n}\cdot\beta)$ . ∎

Lemma 3 follows immediately by Lemma 14.

6.2 Efficient Constructions of $(\alpha,\beta)$ Hopsets

In this section we show an efficient construction of $(3+\epsilon,\beta)$ hopsets. We use the following fact that follows from the proof of Theorem 1.1 in [TZ05]:

Fact 3.

Each of the $k$ clustering steps in the distance oracle algorithm by Thorup and Zwick can be implemented in $O(|E|\cdot n^{1/k})$ centralized time.

The Meta Algorithm.

The algorithm is similar to the algorithm in the proof of Lemma 8 up to modifying the number of phases and the sampling probability of Eq. (11) in the exact same manner as in Section 6.1. In addition, since we are now working with weighted graphs, we will be using the Dijkstra algorithm to compute shortest path trees, instead of BFS traversals. Fix a distance class $[d,2d]$ and define $R^{\prime\prime}=(5+18/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+18/\epsilon)}$ . We then slightly change the initial radius of the clustering $\mathcal{C}_{0}$ to be $r_{0}=d/(2\cdot R^{\prime\prime})$ . By similar arguments as in the proof of Lemma 8 and Observation 7, the final radius after $T^{\prime}=\lceil\log{(k\cdot\rho)}\rceil+\lceil 2/\rho-1\rceil$ phases is:

[TABLE]

The remaining details are almost identical to the meta-algorithm for spanners so we only state the properties of this construction:

Observation 8.

The modified algorithm computes a $(3+\epsilon,\beta)$ hopset $H\subseteq G$ with $\beta=O((5+18/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+18/\epsilon)})$ and $O((n^{1+1/k}+(\log{k}+1/\rho)\cdot n)\cdot\log{\Lambda})$ edges in expectation.

Efficient Implementation in the Centralized Setting.

The tradeoff between the $\beta$ value, the hopset size and the running time of the algorithm is summarized below:

Lemma 15.

For any graph $G=(V,E,w)$ , integer $k\geq 1$ and any $\epsilon>0,1\geq\rho>0$ , one can compute a $(3+\epsilon,\beta)$ hopset $H$ with $\beta=O((5+18/\epsilon)^{\log{\rho}+2/\rho}\cdot k^{\log(5+18/\epsilon)})$ with $O((n^{1+1/k}+(\log{k}+1/\rho)\cdot n)\cdot\log{\Lambda})$ edges and $O(|E|\cdot(n^{1/k}+(\log(k\cdot\rho)+1/\rho)\cdot n^{\rho})\cdot\log\Lambda)$ time in expectation.

Proof.

The stretch and size arguments are almost similar to those in Section 6.1, hence we restrict attention to the running time. Our algorithm begins with a single clustering step of the Throup and Zwick’s algorithm (i.e., computing the first bunch $B_{1}(u)$ for each vertex $u$ ). The output of this step is a clustering $\mathcal{C}_{0}$ . Since clusters are vertex-disjoint, one can compute them in $\widetilde{O}(|E|)$ time. Thus using Fact 3 the entire first part of the algorithm can be implemented in $O(|E|\cdot n^{1/k})$ centralized time. From this point onward the analysis is similar to the analysis of the centralized implementation of spanners thus requiring $O(n^{\rho}\cdot|E|\cdot T^{\prime})$ centralized time. ∎

Appendix A Complete Proofs of Theorems 3 and 6

Recall that $t=\lceil k^{\epsilon}\rceil/4$ and $T=\log_{t}(k^{1-2\epsilon})$ . We will now handle the case where $T$ is not an integer as assumed in Sections 3 and 5.2, thus completing the proofs of Theorems 3 and 6. We will focus on the spanner case of Theorem 3. The exact same argument holds for the hopsets for Theorem 6. By Claim 3, when $T$ is an integer we have a final clustering with radius $r_{T}\leq 1/30\cdot 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ . Let $c=T-\lfloor T\rfloor$ and $T_{0}:=T-c$ . We divide the treatment of the fractional case into two possible cases. In the first case, assume that $t^{c}>1/3\cdot t$ . In this case, the middle stage of Alg. $\mathsf{SpannerLongDist}$ simply contains $\lceil T\rceil$ phases, each of $t$ steps. In this case we have:

Claim 21.

For $t^{c}>1/3\cdot t$ , it holds that the final clustering $\mathcal{C}_{T_{0}+1}$ has $O(n^{1-1/k^{\epsilon}})$ clusters, in expectation, and the final radius $r_{T_{0}+1}\leq 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ .

Proof.

Since $\lceil T\rceil\geq T$ it follows from Claim 7 that the final clustering $\mathcal{C}_{T_{0}+1}$ has $O(n^{1-1/k^{\epsilon}})$ clusters, in expectation. In Claim 3 we bound the radius of the final clustering in the case where $T$ is an integer by $r^{*}=\lceil k^{\epsilon}\rceil\cdot(2\cdot\lceil k^{\epsilon}\rceil)^{T}\leq 1/30\cdot 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ , thus if $t<3t^{c}$ , by the same claim, we get that the final radius after $T_{0}+1$ phases is at most:

[TABLE]

∎

In the complementary case where $t^{c}\leq 1/3\cdot t$ , the adaptation of Alg. $\mathsf{SpannerLongDist}$ is as follows: In the second stage of the algorithm, it applies $T_{0}$ clustering phases as usual (i.e. each with $t=\lceil k^{\epsilon}\rceil/4$ steps), and the last $T_{0}+1$ phase will consist of only $\lfloor t^{c}\rfloor\leq t$ steps. We will denote this last $T_{0}+1$ phase as a fractional phase. Again, we will show that after these $T_{0}+1$ phases, it holds that the number of clusters in the final clustering $\mathcal{C}_{T_{0}+1}$ is $O(n^{1-1/k^{\epsilon}})$ in expectation and that the radius of this clustering is $r_{T_{0}+1}\leq 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ . We begin by bounding the number of clusters in the final clustering $\mathcal{C}_{T_{0}+1}$ :

Claim 22.

After $T_{0}$ phases of $t$ steps and one last phase of $\lfloor t^{c}\rfloor$ steps, the final clustering $\mathcal{C}_{T_{0}+1}$ has $n^{1-1/k^{\epsilon}}$ clusters, in expectation.

Proof.

By Claim 7 after $T_{0}$ phases,

[TABLE]

in expectation. By the same claim it follows that after additional $\lfloor t^{c}\rfloor$ steps of Proc. $\mathsf{SuperClusterAugment}$ , the expected number of clusters in the final clustering $\mathcal{C}_{T_{0}+1}$ is:

[TABLE]

∎

We continue to bound the radius of the final clustering $\mathcal{C}_{T_{0}+1}$ :

Claim 23.

$r_{T_{0}+1}\leq 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ .

Proof.

By Claim 3 it holds that after $T_{0}$ phases the radius of the clustering $\mathcal{C}_{T_{0}}$ is bounded by $r_{T_{0}}=\lceil k^{\epsilon}\rceil(2\cdot\lceil k^{\epsilon}\rceil)^{T_{0}}\leq\frac{r^{*}}{(2\cdot\lceil k^{\epsilon}\rceil)^{c}}$ . By the same claim after additional $t^{\prime}:=\lfloor t^{c}\rfloor$ steps of Procedure $\mathsf{SuperClusterAugment}$ the radius is bounded by:

[TABLE]

where the third inequality following since for any $x\leq 1$ :

[TABLE]

and in particular since by assumption $4\cdot t\geq 16$ , we have:

[TABLE]

where the last inequality follows since $t\geq 3t^{c}$ . We conclude $r_{T_{0}+1}\leq 64^{1/\epsilon-1}\cdot k^{1-\epsilon}$ . ∎

We proceed with providing stretch and size arguments.

Claim 24.

For any fixed distance $d$ , it holds that Algorithm $\mathsf{SpannerLongDist}$ outputs a subgraph $H_{d}\subseteq{G}$ such that for any $u,v\in V$ with $\mbox{\rm dist}_{G}(u,v)=d$ it holds that $\mbox{\rm dist}_{H}(u,v)\leq 4\cdot k^{\epsilon}\cdot d+1/15\cdot 64^{1/\epsilon}\cdot k$ .

Proof.

Fix a distance $d$ . By Claim 1 we have that for any [math]-unclustered vertex $u$ and any $v\in N(u)$ it holds that $\mbox{\rm dist}_{H}(u,v)\leq 2\cdot\lceil k^{\epsilon}\rceil$ .

Furthermore, since the definition of the $\alpha_{i,j}$ parameters in unchanged, by Claim 4, for any $(i,j)$ -unclustered vertex $u$ and any $v\in\partial\mathsf{\textbf{B}}_{G}(u,\alpha_{i,j})$ it holds that $\mbox{\rm dist}_{H}(u,v)\leq\lceil k^{\epsilon}\rceil\cdot\mbox{\rm dist}_{G}(u,v)$ . In particular, $\mbox{\rm dist}_{H}(u,v)\leq 2\cdot r_{T_{0}+1}$ . Thus by the exact same argument as in the proof of Lemma 4 only with $r_{T_{0}+1}$ instead of $r^{*}$ , it holds that for any $u,v$ with $\mbox{\rm dist}_{G}(u,v)=d$ , if all the vertices on the shortest path between $u$ and $v$ in $G$ are unclustered, then $\mbox{\rm dist}_{H}(u,v)\leq 2\cdot\lceil k^{\epsilon}\rceil\cdot\mbox{\rm dist}_{G}(u,v)+2\cdot r_{T_{0}+1}$ .

It remains to prove the analogue of Claim 5, that is, providing the stretch argument for the case where the $u$ - $v$ shortest contains at least one clustered vertex $w$ . By a similar arguments as in Claim 5, we have that in this case $\mbox{\rm dist}_{H}(u,v)\leq 4\cdot k^{\epsilon}\cdot\mbox{\rm dist}_{G}(u,v)+4\cdot r_{T_{0}+1}$ . Thus plugging the value of $r_{T_{0}+1}$ from Claims 21 and 23, we get that

[TABLE]

∎

Claim 25.

For any fixed distance $d\leq 64^{1/\epsilon}\cdot k$ it holds that the algorithm outputs a subgraph $H_{d}$ of expected size $O(k^{\epsilon}\cdot n^{1+1/k}+64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ .

Proof.

First note that up to the last $T_{0}$ phases of Procedure $\mathsf{SuperClusterAugment}$ , the algorithm is similar to the integral case, thus by the analysis of Lemma 4, until this step $O(k^{\epsilon}\cdot n^{1+1/k}+64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ edges are added to the spanner, in expectation.

We now bound the number of remaining edges added to the spanner in the last phase. By Claim 6 it holds that in each step of the last phase we add $O(n)$ shortest paths in expectation. In step $i$ of the last phase the added shortest paths are of length at most $r_{T_{0},0}+r_{T_{0},i-1}+2\cdot\alpha_{T_{0},i}$ , and by Eq. (3) it holds that $r_{T_{0},0}+r_{T_{0},i-1}+2\cdot\alpha_{T_{0},i}\leq r_{T_{0},i}$ . In the first case, where $t^{c}>1/3\cdot t$ , by Claim 21 it holds that we add an extra of $O(n\cdot\sum_{i=1}^{t}r_{T_{0},i+1})=O(n\cdot k^{2\epsilon}\cdot r_{T_{0}+1})=O(n\cdot 64^{1/\epsilon}\cdot k^{1+\epsilon})$ edges in expectation. In the complementary case we add $O(\sum_{i=1}^{\lfloor t^{c}\rfloor+1}n\cdot r_{T_{0},i})=O(n\cdot k^{2\cdot\epsilon}\cdot r_{T_{0},t^{\prime}})=O(64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ edges in expectation. Thus over all the middle stage in the fractional case adds an extra of $O(64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ edges in expectation.

In the final stage we construct a $(2\cdot\lceil k^{\epsilon}\rceil-3)$ spanner of the cluster graph. By Claims 21 and 22, the expected size of the final clustering $\mathcal{C}_{T_{0}+1}$ is $O(n^{1-1/k^{\epsilon}})$ , thus as in the proof of Lemma 4 it holds that the size of the spanner of the cluster graph is $O(n)$ . Since each edge in this spanner is translated into a path of length $O(d+r_{T_{0}+1})$ in the final spanner, it holds that this stage adds $O((d+r_{T_{0}+1})\cdot n)$ edges to the spanner. We conclude that the construction in the fractional case is of size $O(k^{\epsilon}\cdot n^{1+1/k}+64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n+64^{1/\epsilon-1}\cdot k^{1-\epsilon}\cdot n+(d+r_{T_{0}+1})\cdot n)=O(k^{\epsilon}\cdot n^{1+1/k}+64^{1/\epsilon}\cdot k^{1+\epsilon}\cdot n)$ . ∎

Finally by Observation 1, we conclude that taking the output subgraphs $H_{d}$ of Algorithm $\mathsf{SpannerLongDist}$ for every $1\leq d\leq 64^{1/\epsilon}\cdot k^{1-\epsilon}$ yields a $(8\cdot k^{\epsilon},64^{1/\epsilon}\cdot k)$ -spanner of $G$ of expected size at most $O(64^{1/\epsilon}\cdot k\cdot n^{1+1/k}+64^{2/\epsilon}\cdot k^{2}\cdot n)$ . This completes the proof of the fractional case of Theorem 3.

Hopsets.

As in the spanner case we divide the treatment of the fractional case to two cases as explained above. In the first case, where $t^{c}>1/3\cdot t$ , the algorithm applies $T_{0}+1$ standard phases of Procedure $\mathsf{ClusterAndAugmentHop}$ , each with $t$ steps. In the complementary case, the algorithm applies $T_{0}$ standard phases of the procedure, and a final fractional phase of $t^{\prime}=\lfloor t^{c}\rfloor$ steps. In Claim 14 we show that in the case where $T$ is an integer, the final radius is bounded by $r_{T}\leq d/648$ . By similar claims to Claim 21 and 23, it holds that the final radius in the fractional case is at most $r_{T_{0}+1}\leq 24\cdot r_{T}\leq d/27$ . Since we have some slackness in the stretch arguments of Claim 16 and in the proof of Thm. 6, the same bound holds when plugging the bound on the final radius $r_{T_{0}+1}$ instead of $r_{T}$ .

For the size analysis, the algorithm is the same of that in Section 5.2 until the end of the $T_{0}$ th phase. Therefore, by the proof of Thm. 6 until this last phase, $O(k^{\epsilon}\cdot n^{1+1/k}+k^{\epsilon}/\epsilon\cdot n)$ are added to the hopset. By a similar argument to Claim 6, the last $T_{0}+1$ phase with $t^{\prime}$ steps adds $O(t\cdot n)=O(k^{\epsilon}\cdot n)$ hops to the hopset, in expectation. By the exact same argument as in Claim 22, in the third stage of the algorithm there are $n^{1-1/k^{\epsilon}}$ clusters, in expectation. Consequently as in the proof of Thm. 6, the spanner of the cluster graph is of size $O(n)$ , thus this stage contributes $O(n)$ hops to the hopset. We conclude that altogether the hopset in the fractional case is of size $O(k^{\epsilon}\cdot n^{1+1/k}+k^{\epsilon}/\epsilon\cdot n)$ . Lemma 7 follows.

Appendix B Improved $(3+\epsilon,\beta)$ Spanners and Hopsets

B.1 Spanners

In the following subsection we state and sketch the construction of a $(3+\epsilon,\beta)$ -spanner with an improved $\beta$ and provide the proof for Lemma 1. For example, we show a $(4,\beta)$ spanner for $\beta=k^{\log 11}$ . The algorithm is similar to the algorithm of Section 4, with the only differences that we set $\alpha_{1}=1/2$ , and in step (3) of the $i^{th}$ phase of the algorithm, the sampled clusters add to $H_{i}$ paths between their centers to centers at distance at most $2\cdot r_{i-1}+2\cdot\alpha_{i}$ (instead of $4\cdot r_{i-1}+4\cdot\alpha_{i}$ ). This change affects the radii of the clusters in the following way:

Observation 9.

For every $i\in\{1,\ldots,T\}$ , $r_{i}\leq(3+8/\epsilon)^{i-1}$ . In particular, the radius of cluster in the final clustering $\mathcal{C}_{T}$ is $r_{T}\leq(3+8/\epsilon)^{\log{k}+1}=(3+8/\epsilon)\cdot k^{\log(3+8/\epsilon)}$ .

Proof.

The proof is similar to the proof of Observation 2, with the difference that $r_{1}=1$ , and at each phase each new cluster is a star of a cluster that connects to other clusters with center-distance at most $2\cdot r_{i-1}+2\cdot\alpha_{i}$ , thus the radius of the new cluster is at most $3\cdot r_{i-1}+2\cdot\alpha_{i}\leq(3+8/\epsilon)\cdot r_{i-1}$ . ∎

As in the proof of Lemma 5, the size of $H$ is bounded by:

[TABLE]

Following the proof of Lemma 5, we have that $H$ is a $(3+\epsilon,\beta)$ -spanner with $\beta=4\cdot r_{T}=(3+8/\epsilon)\cdot k^{\log(3+8/\epsilon)}$ , and Lemma 1 follows.

B.2 Hopsets

In the following subsection we state and sketch the construction of a $(3+\epsilon,\beta)$ -spanner with an improved $\beta$ and provide the proof for Lemma 2. For example, we show a $(4,\beta)$ hopset for $\beta=k^{\log 12}$ . The algorithm is similar to the algorithm of Section 5.3, with the only differences that we set $R^{\prime}=(3+\frac{8}{\epsilon})\cdot k^{\log(3+\frac{8}{\epsilon})}$ and in step (3) of the algorithm, the sampled clusters add to $H_{i}$ hops to cluster centers at distance $2\cdot r_{i-1}+2\cdot\alpha_{i}$ (instead of $4\cdot r_{i-1}+4\cdot\alpha_{i}$ ). This change affects only the radius of the clusters, and in the following way:

Observation 10.

For every $i\in\{1,\ldots,T\}$ , $r_{i}\leq d/2R^{\prime}\cdot(3+8/\epsilon)^{i}$ . In particular, the radius of cluster in the final clustering $\mathcal{C}_{T}$ is $r_{T}\leq d/2$ .

Proof.

We set $r_{0}:=d/2R^{\prime}$ , and the rest follows by the same argument as in Observation 9. ∎

As in the proof of Lemma 8, the size of a hopset for a fixed distance range is bounded by

[TABLE]

Thus the expected size of the hopset is $O((n^{1+1/k}+\log{k}\cdot n)\cdot\log\Lambda)$ . Furthermore, as in the proof of Lemma 8, $\alpha=(3+1.125\cdot\epsilon,\beta)$ and $\beta=16\cdot R^{\prime}$ , by plugging $\epsilon^{\prime}=8/9\cdot\epsilon$ , Lemma 2 follows.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AB 17] Amir Abboud and Greg Bodwin. The 4/3 additive spanner exponent is tight. Journal of the ACM (JACM) , 64(4):28, 2017.
2[ABP 18a] Amir Abboud, Greg Bodwin, and Seth Pettie. A hierarchy of lower bounds for sublinear additive spanners. SIAM Journal on Computing , 47(6):2203–2236, 2018.
3[ABP 18b] Amir Abboud, Greg Bodwin, and Seth Pettie. A hierarchy of lower bounds for sublinear additive spanners. SIAM J. Comput. , 47(6):2203–2236, 2018.
4[ACIM 99] Donald Aingworth, Chandra Chekuri, Piotr Indyk, and Rajeev Motwani. Fast estimation of diameter and shortest paths (without matrix multiplication). SIAM Journal on Computing , 28(4):1167–1181, 1999.
5[ADD + 93a] Ingo Althöfer, Gautam Das, David Dobkin, Deborah Joseph, and José Soares. On sparse spanners of weighted graphs. Discrete & Computational Geometry , 9(1):81–100, 1993.
6[ADD + 93b] Ingo Althöfer, Gautam Das, David P. Dobkin, Deborah Joseph, and José Soares. On sparse spanners of weighted graphs. Discrete & Computational Geometry , 9:81–100, 1993.
7[BKMP 05] Surender Baswana, Telikepalli Kavitha, Kurt Mehlhorn, and Seth Pettie. New constructions of ( α 𝛼 \alpha , β 𝛽 \beta )-spanners and purely additive spanners. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms , pages 672–681. Society for Industrial and Applied Mathematics, 2005.
8[BS 07] Surender Baswana and Sandeep Sen. A simple and linear time randomized algorithm for computing sparse spanners in weighted graphs. Random Struct. Algorithms , 30(4):532–563, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

New (α,β)(\alpha,\beta)(α,β) Spanners and Hopsets

Contents

1 Introduction

Hopsets.

Application to Shortest Path Computation.

1.1 Our Contribution.

Theorem 1** (Almost Optimal f(d)f(d)f(d)-Spanners).**

Theorem 2** (Spanners for Pairs at Dist. O(k)O(\sqrt{k})O(k​)).**

New (α,β)(\alpha,\beta)(α,β) Spanners.

Theorem 3**.**

Lemma 1** (New (3+ϵ,β)(3+\epsilon,\beta)(3+ϵ,β) Spanners).**

New (α,β)(\alpha,\beta)(α,β) Hopsets.

Theorem 4** (New (kϵ,k1−ϵ)(k^{\epsilon},k^{1-\epsilon})(kϵ,k1−ϵ) Hopsets).**

Lemma 2** (New (3+ϵ,β)(3+\epsilon,\beta)(3+ϵ,β) Hopsets).**

Applications to Shortest Paths.

Lemma 3** (Approx. APSP).**

1.2 Technical Overview.

1.2.1 New Spanners

Useful Properties of the Baswana-Sen Algorithm [BS07].

Warming Up: (α,β)(\alpha,\beta)(α,β) Spanners with α=O(k)\alpha=O(\sqrt{k})α=O(k​) and β=k\beta=kβ=k.

The challenge in obtaining a multiplicative stretch α=o(k)\alpha=o(\sqrt{k})α=o(k​).

A New Three Stage Approach for (α,β)(\alpha,\beta)(α,β) Spanners.

1.2.2 New Hopsets

A Short Exposition of (2k−1,2)(2k-1,2)(2k−1,2) Hopsets.

Warming Up: (α,β)(\alpha,\beta)(α,β) Hopset with α,β=O(k)\alpha,\beta=O(\sqrt{k})α,β=O(k​).

Three Stage Approach for (kϵ,k1−ϵ)(k^{\epsilon},k^{1-\epsilon})(kϵ,k1−ϵ) Hopsets.

Open Problems.

1.3 Preliminaries

Graph Notations and Definitions.

(α,β)(\alpha,\beta)(α,β) Hopsets.

Clusters and Superclusters.

1.4 Algorithmic Tools

Multiplicative Spanners of Baswana and Sen [BS07].

Fact 1**.**

Distance Oracles and Hopsets of Thorup and Zwick [TZ05].

Fact 2**.**

Proof.

Roadmap.

2 Improved Spanners for Close Vertex Pairs

Description of Algorithm SpannerShortDist\mathsf{SpannerShortDist}SpannerShortDist.

Proof.

3 New (kϵ,Oϵ(k))(k^{\epsilon},O_{\epsilon}(k))(kϵ,Oϵ​(k)) Spanners

Lemma 4**.**

Algorithm SpannerLongDist\mathsf{SpannerLongDist}SpannerLongDist.

Preliminary Stage: Truncated Baswana-Sen Algorithm.

Claim 1**.**

Middle Stage: Superclustering.

Finalizing Stage: Spanner on the Cluster Graph.

Stretch Analysis.

Claim 2**.**

Proof.

Claim 3**.**

Proof.

Definition 1** (Clustered and Unlcustered Vertices).**

Claim 4**.**

Proof.

Claim 5**.**

Proof.

Proof of Lemma 4.

Size Analysis.

Claim 6**.**

Claim 7**.**

Observation 1**.**

4 New (3+ϵ,β)(3+\epsilon,\beta)(3+ϵ,β) Spanner

Lemma 5**.**

Algorithm Description.

Stretch Analysis.

Observation 2**.**

Proof.

Observation 3**.**

Proof.

Definition 2**.**

Claim 8** (111-unclustered).**

New $(\alpha,\beta)$ Spanners and Hopsets

Theorem 1 (Almost Optimal $f(d)$ -Spanners).

Theorem 2 (Spanners for Pairs at Dist. $O(\sqrt{k})$ ).

New $(\alpha,\beta)$ Spanners.

Theorem 3.

Lemma 1 (New $(3+\epsilon,\beta)$ Spanners).

New $(\alpha,\beta)$ Hopsets.

Theorem 4 (New $(k^{\epsilon},k^{1-\epsilon})$ Hopsets).

Lemma 2 (New $(3+\epsilon,\beta)$ Hopsets).

Lemma 3 (Approx. APSP).

Warming Up: $(\alpha,\beta)$ Spanners with $\alpha=O(\sqrt{k})$ and $\beta=k$ .

The challenge in obtaining a multiplicative stretch $\alpha=o(\sqrt{k})$ .

A New Three Stage Approach for $(\alpha,\beta)$ Spanners.

A Short Exposition of $(2k-1,2)$ Hopsets.

Warming Up: $(\alpha,\beta)$ Hopset with $\alpha,\beta=O(\sqrt{k})$ .

Three Stage Approach for $(k^{\epsilon},k^{1-\epsilon})$ Hopsets.

$(\alpha,\beta)$ Hopsets.

Fact 1.

Fact 2.

Description of Algorithm $\mathsf{SpannerShortDist}$ .

3 New $(k^{\epsilon},O_{\epsilon}(k))$ Spanners

Lemma 4.

Algorithm $\mathsf{SpannerLongDist}$ .

Claim 1.

Claim 2.

Claim 3.

Definition 1 (Clustered and Unlcustered Vertices).

Claim 4.

Claim 5.

Claim 6.

Claim 7.

Observation 1.

4 New $(3+\epsilon,\beta)$ Spanner

Lemma 5.

Observation 2.

Observation 3.

Definition 2.

Claim 8 ( $1$ -unclustered).

Claim 9 ( $i$ -unclustered).

Claim 10.

Lemma 6.

5 A New Family of $(k^{\epsilon},k^{1-\epsilon})$ Hopsets

5.1 $(k^{\epsilon},k^{1-\epsilon})$ Hopsets for $\epsilon\in[1/2,1)$ .

Theorem 5.

Claim 11.

5.2 $(O(k^{\epsilon}),O_{\epsilon}(k^{1-\epsilon}))$ Hopsets for $0<\epsilon<1/2$ .

Theorem 6.

Lemma 7.

Algorithm $\mathsf{HopsetsSmallStretch}$ .

Claim 12.

Claim 13.

Claim 14.

Definition 3 (Clustered and Unlcustered Vertices).

Claim 15.

Claim 16.

5.3 New $(3+\epsilon,\beta)$ Hopset

Lemma 8.

Definition 4.

Observation 4.

Observation 5.

Claim 17 ([math]-unclustered).

Claim 18 ( $i$ -unclustered).

Claim 19.

6.1 Efficient Constructions of $(3+\epsilon,\beta)$ Spanners and Applications

Observation 6.

Lemma 9.

Lemma 10.

Observation 7.

Lemma 11.

Corollary 1.

The ${\mathsf{LOCAL}}$ Model.

Lemma 12.