Give Me Some Slack: Efficient Network Measurements

Ran Ben Basat; Gil Einziger; Roy Friedman

arXiv:1703.01166·cs.DS·April 25, 2018

Give Me Some Slack: Efficient Network Measurements

Ran Ben Basat, Gil Einziger, Roy Friedman

PDF

TL;DR

This paper explores how allowing a small slack in the sliding window size can lead to more efficient algorithms for network measurement problems, reducing memory requirements and enabling faster computations.

Contribution

It introduces a slack-based model for sliding window problems, demonstrating improved algorithmic efficiency and reduced space complexity for key network measurement tasks.

Findings

01

Slack enables algorithms for MAX and GENERAL-SUM to use less memory.

02

For sub-linear approximation problems, slack further reduces asymptotic resource requirements.

03

The model offers practical benefits for high-speed network measurement implementations.

Abstract

Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing \emph{slack} in the window size on the asymptotic requirements of sliding window problems. That is, the algorithm can dynamically adjust the window size between $W$ and $W (1 + τ)$ where $τ$ is a small positive parameter. We demonstrate this model's attractiveness by showing that it enables efficient algorithms to problems such as MAX and GENERAL-SUM that require $Ω (W)$ bits even for constant factor approximations in the exact sliding window model. Additionally, for problems that admit sub-linear approximation algorithms such as BASIC-SUMMING and COUNT-DISTINCT, the slack model…

Tables1

Table 1. Table 1 : Comparison of Basic-Summing algorithms. Our contributions are in bold. All algorithms process elements in constant time except for the rightmost column where both update in O ( log ⁡ ( R W ) ) 𝑂 𝑅 𝑊 O(\log\left({RW}\right)) time. We present matching lower bounds to all our algorithms.

	Exact Sum	Additive Error	Multiplicative Error
$τ = Θ (1)$	$𝚯 (log (𝐑𝐖))$	$𝚯 (log (𝑾 / ϵ))$	$𝚯 (log (𝑾 / ϵ) + log log 𝑹)$	$O (ϵ^{- 1} \log (R W) \log \log (R W)) [16]$
Exact Window	$Θ (W \log R)$	$Θ (ϵ^{- 1} + \log W) [4]$	$O (ϵ^{- 1} \log^{2} (R W)) [27]$	$O (ϵ^{- 1} \log R W \log (W \log R)) [16]$

Equations106

L_{E_{2}} ≜ {0^{W τ + i} σ R^{W τ - i - 1} ∣ i \in [W τ - 1], σ \in [R]}, \overline{L_{E_{2}}} ≜ {w_{1} \cdot w_{2} \dots w_{⌈ τ^{- 1} /2 ⌉} ∣ \forall i : w_{i} \in L_{E_{2}}} .

L_{E_{2}} ≜ {0^{W τ + i} σ R^{W τ - i - 1} ∣ i \in [W τ - 1], σ \in [R]}, \overline{L_{E_{2}}} ≜ {w_{1} \cdot w_{2} \dots w_{⌈ τ^{- 1} /2 ⌉} ∣ \forall i : w_{i} \in L_{E_{2}}} .

a_{n, k} ≜ {1 ⌈ (1 + ϵ) (a_{n - 1, k} + \sum_{i = 1}^{k - 1} ψ^{i}) ⌉ \mbox n = 1 \mbox o t h er w i se .

a_{n, k} ≜ {1 ⌈ (1 + ϵ) (a_{n - 1, k} + \sum_{i = 1}^{k - 1} ψ^{i}) ⌉ \mbox n = 1 \mbox o t h er w i se .

b_{n, k} ≜ {1 (1 + ϵ) (b_{n - 1, k} + \sum_{i = 1}^{k - 1} ψ^{i}) + 1 \mbox n = 1 \mbox O t h er w i se .

b_{n, k} ≜ {1 (1 + ϵ) (b_{n - 1, k} + \sum_{i = 1}^{k - 1} ψ^{i}) + 1 \mbox n = 1 \mbox O t h er w i se .

b_{n, k} = (1 + ϵ)^{n - 1} + \frac{( 1 + ϵ ) ^{n} - 1}{( 1 + ϵ ) - 1} ((1 + ϵ) i = 1 \sum k - 1 ψ^{i} + 1) .

b_{n, k} = (1 + ϵ)^{n - 1} + \frac{( 1 + ϵ ) ^{n} - 1}{( 1 + ϵ ) - 1} ((1 + ϵ) i = 1 \sum k - 1 ψ^{i} + 1) .

b_{n, k}

b_{n, k}

\leq (1 + ϵ)^{n - 1} + ((1 + ϵ) 2 ψ^{k - 1}) \frac{( 1 + ϵ ) ^{n} - 1}{ϵ} \leq 4 ϵ^{- 1} (1 + ϵ)^{n + 1} ψ^{k - 1} .

∣ I_{k} ∣ = ar g max {n ∣ 4 ϵ^{- 1} (1 + ϵ)^{n + 1} ψ^{k - 1} \leq ψ^{k}} \geq lo g_{1 + ϵ} (ψ ϵ /4) - 1 = \frac{ln ( ψ ϵ /4 )}{ln ( 1 + ϵ )} - 1 \geq ϵ^{- 1} ln (ψ ϵ /4) - 1.

∣ I_{k} ∣ = ar g max {n ∣ 4 ϵ^{- 1} (1 + ϵ)^{n + 1} ψ^{k - 1} \leq ψ^{k}} \geq lo g_{1 + ϵ} (ψ ϵ /4) - 1 = \frac{ln ( ψ ϵ /4 )}{ln ( 1 + ϵ )} - 1 \geq ϵ^{- 1} ln (ψ ϵ /4) - 1.

\overline{L_{M, 2}} ≜ 0^{W} \cdot 0^{W τ} \cdot r e p (I_{⌈ τ^{- 1} /2 ⌉}) \cdot 0^{W τ} \cdot r e p (I_{⌈ τ^{- 1} /2 ⌉ - 1}) \dots 0^{W τ} \cdot r e p (I_{1}) = {0^{W} \cdot w_{1} \cdot w_{2} \dots w_{⌈ τ^{- 1} /2 ⌉} ∣ \forall i : w_{i} \in {0^{W τ} \cdot r e p (x) ∣ x \in I_{⌈ τ^{- 1} /2 ⌉ + 1 - i}}} .

\overline{L_{M, 2}} ≜ 0^{W} \cdot 0^{W τ} \cdot r e p (I_{⌈ τ^{- 1} /2 ⌉}) \cdot 0^{W τ} \cdot r e p (I_{⌈ τ^{- 1} /2 ⌉ - 1}) \dots 0^{W τ} \cdot r e p (I_{1}) = {0^{W} \cdot w_{1} \cdot w_{2} \dots w_{⌈ τ^{- 1} /2 ⌉} ∣ \forall i : w_{i} \in {0^{W τ} \cdot r e p (x) ∣ x \in I_{⌈ τ^{- 1} /2 ⌉ + 1 - i}}} .

lo g (∣

lo g (∣

= Ω (τ^{- 1} (lo g ϵ^{- 1} + lo g lo g (⌈ τ^{- 1} /2 ⌉ R W /8 \cdot ϵ)))

= Ω (τ^{- 1} (lo g ϵ^{- 1} + lo g (\frac{lo g ( R W /8 )}{⌈ τ ^{- 1} /2 ⌉} + lo g ϵ)))

= Ω (τ^{- 1} (lo g (τ / ϵ) + lo g lo g (R W))) .

σ_{\overline{W}} ≜ \frac{\sum _{x \in \overline{W}} ( x - m _{\overline{W}} ) ^{2}}{W - 1} = \frac{\sum _{x \in \overline{W}} x ^{2} - 2 m _{\overline{W}} \sum _{x \in \overline{W}} x + W \cdot m _{\overline{W}}^{2}}{W - 1} = \frac{\sum _{x \in \overline{W}} x ^{2} - W \cdot m _{\overline{W}}^{2}}{W - 1},

σ_{\overline{W}} ≜ \frac{\sum _{x \in \overline{W}} ( x - m _{\overline{W}} ) ^{2}}{W - 1} = \frac{\sum _{x \in \overline{W}} x ^{2} - 2 m _{\overline{W}} \sum _{x \in \overline{W}} x + W \cdot m _{\overline{W}}^{2}}{W - 1} = \frac{\sum _{x \in \overline{W}} x ^{2} - W \cdot m _{\overline{W}}^{2}}{W - 1},

L_{E_{1}} ≜ {0^{W τ + i} σ R^{W - i - 1} 0^{j} ∣ i, j \in [W - 1], i \geq j, σ \in ([R] ∖ {0})} \cup {0^{W + W τ}} .

L_{E_{1}} ≜ {0^{W τ + i} σ R^{W - i - 1} 0^{j} ∣ i, j \in [W - 1], i \geq j, σ \in ([R] ∖ {0})} \cup {0^{W + W τ}} .

L_{A_{1}} ≜ {r e p (k \cdot 2 R W ϵ) ∣ k \in [⌊ 1/4 ϵ ⌋] ∖ {0}}; \overline{L_{A_{1}}} ≜ 0^{W + W τ} \cdot L_{A_{1}} \cdot {0^{q} ∣ q \in [⌊ W /2 ⌋]} .

L_{A_{1}} ≜ {r e p (k \cdot 2 R W ϵ) ∣ k \in [⌊ 1/4 ϵ ⌋] ∖ {0}}; \overline{L_{A_{1}}} ≜ 0^{W + W τ} \cdot L_{A_{1}} \cdot {0^{q} ∣ q \in [⌊ W /2 ⌋]} .

a_{n} = ⌈ (1 + ϵ) a_{n - 1} ⌉

a_{n} = ⌈ (1 + ϵ) a_{n - 1} ⌉

= ⌈ ϵ^{- 1} (1 + ϵ)^{n} - ϵ^{- 1} - 1 ⌉ < ϵ^{- 1} \cdot ((1 + ϵ)^{n} - 1) .

\overline{L_{M}} ≜ {0^{W + W τ} r e p (x) 0^{j} ∣ j \in [⌊ W /2 ⌋], x \in I_{M}} .

\overline{L_{M}} ≜ {0^{W + W τ} r e p (x) 0^{j} ∣ j \in [⌊ W /2 ⌋], x \in I_{M}} .

lo g ∣ \overline{L_{M}} ∣ \geq lo g (ln (2) ϵ^{- 1} lo g (R W ϵ /2) - O (1)) + lo g W - 1 = lo g (W / ϵ) + lo g lo g (R W ϵ) - O (1) .

lo g ∣ \overline{L_{M}} ∣ \geq lo g (ln (2) ϵ^{- 1} lo g (R W ϵ /2) - O (1)) + lo g W - 1 = lo g (W / ϵ) + lo g lo g (R W ϵ) - O (1) .

S = R \cdot (y_{t - W} + i = 1 \sum W + c x_{t - W + i}^{'}) .

S = R \cdot (y_{t - W} + i = 1 \sum W + c x_{t - W + i}^{'}) .

ℓ = 1 \sum W + c x_{t - W + ℓ}^{'} - \frac{1}{R} ℓ = 1 \sum W + c x_{t - W + ℓ} = ℓ = 1 \sum W + c x_{t - W + ℓ}^{'} - \frac{1}{R} S \leq (W + c) \cdot 2^{- υ_{1} - 1} .

ℓ = 1 \sum W + c x_{t - W + ℓ}^{'} - \frac{1}{R} ℓ = 1 \sum W + c x_{t - W + ℓ} = ℓ = 1 \sum W + c x_{t - W + ℓ}^{'} - \frac{1}{R} S \leq (W + c) \cdot 2^{- υ_{1} - 1} .

∣ y_{t - W} ∣ \leq W τ \cdot 2^{- υ_{2} - 1} .

∣ y_{t - W} ∣ \leq W τ \cdot 2^{- υ_{2} - 1} .

E ≜ ∣ S - S ∣ \leq R \cdot (W τ \cdot 2^{- υ_{2} - 1} + (W + c) \cdot 2^{- υ_{1} - 1}) < R W \cdot (τ 2^{- υ_{2} - 1} + 2^{- υ_{1}}) .

E ≜ ∣ S - S ∣ \leq R \cdot (W τ \cdot 2^{- υ_{2} - 1} + (W + c) \cdot 2^{- υ_{1} - 1}) < R W \cdot (τ 2^{- υ_{2} - 1} + 2^{- υ_{1}}) .

⌈ lo g (lo g_{(1 + ϵ /2)} (R W τ) + 2) ⌉

⌈ lo g (lo g_{(1 + ϵ /2)} (R W τ) + 2) ⌉

= lo g (lo g R W τ) - lo g lo g (1 + ϵ /2) + O (1)

= lo g (lo g R W τ) - lo g (\frac{ln ( 1 + ϵ /2 )}{ln 2}) + O (1)

= lo g (lo g R W τ) - lo g (\frac{ϵ /2}{ln 2} + O (ϵ^{2})) + O (1)

= lo g (lo g R W τ) + lo g ϵ^{- 1} + O (1) .

τ^{- 1} (lo g lo g (R W τ) + lo g ϵ^{- 1} + O (1)) + O (lo g (R W)) = O (τ^{- 1} (lo g lo g (R W τ) + lo g ϵ^{- 1}) + lo g (R W)) .

τ^{- 1} (lo g lo g (R W τ) + lo g ϵ^{- 1} + O (1)) + O (lo g (R W)) = O (τ^{- 1} (lo g lo g (R W τ) + lo g ϵ^{- 1}) + lo g (R W)) .

\frac{y}{1 + ϵ} < ((1 + ϵ /2)^{⌊ l o g_{(1 + ϵ /2)} y ⌋})_{↓} \leq y

\frac{y}{1 + ϵ} < ((1 + ϵ /2)^{⌊ l o g_{(1 + ϵ /2)} y ⌋})_{↓} \leq y

((1 + ϵ /2)^{⌊ l o g_{(1 + ϵ /2)} y ⌋})_{↓} > ((1 + ϵ /2)^{(l o g_{(1 + ϵ /2)} y) - 1})_{↓} = (\frac{y}{( 1 + ϵ /2 )})_{↓} \geq \frac{y}{( 1 + ϵ /2 )} - ϵ /4.

((1 + ϵ /2)^{⌊ l o g_{(1 + ϵ /2)} y ⌋})_{↓} > ((1 + ϵ /2)^{(l o g_{(1 + ϵ /2)} y) - 1})_{↓} = (\frac{y}{( 1 + ϵ /2 )})_{↓} \geq \frac{y}{( 1 + ϵ /2 )} - ϵ /4.

S ≜ ℓ = 1 \sum W + c x_{t - W + ℓ} .

S ≜ ℓ = 1 \sum W + c x_{t - W + ℓ} .

\forall  \in [τ^{- 1} - 1] : b_{} = ⌊ lo g_{(1 + ϵ /2)} S_{} ⌋ .

\forall  \in [τ^{- 1} - 1] : b_{} = ⌊ lo g_{(1 + ϵ /2)} S_{} ⌋ .

B =  = 0 \sum τ^{- 1} - 1 ((1 + ϵ /2)^{b_{}})_{↓} .

B =  = 0 \sum τ^{- 1} - 1 ((1 + ϵ /2)^{b_{}})_{↓} .

\forall  \in [τ^{- 1} - 1] : S_{} > 0 ⟹ \frac{S _{}}{1 + ϵ} < ((1 + ϵ /2)^{⌊ l o g_{(1 + ϵ /2)} S_{} ⌋})_{↓} \leq S_{} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11affiliationtext: Department of Computer Science, Technion

{sran,roy}@cs.technion.ac.il22affiliationtext: Nokia Bell Labs

[email protected]

\Copyright

Ran Ben-Basat, Gil Einziger, Roy Friedman

Give Me Some Slack: Efficient Network Measurements

Ran Ben Basat

Gil Einziger

Roy Friedman

Abstract

Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing slack in the window size on the asymptotic requirements of sliding window problems. That is, the algorithm can dynamically adjust the window size between $W$ and $W(1+\tau)$ where $\tau$ is a small positive parameter. We demonstrate this model’s attractiveness by showing that it enables efficient algorithms to problems such as Maximum and General-Summing that require $\Omega(W)$ bits even for constant factor approximations in the exact sliding window model. Additionally, for problems that admit sub-linear approximation algorithms such as Basic-Summing and Count-Distinct, the slack model enables a further asymptotic improvement.

The main focus of the paper is on the widely studied Basic-Summing problem of computing the sum of the last $W$ integers from $\left\{0,1\ldots,R\right\}$ in a stream. While it is known that $\Omega(W\log{R})$ bits are needed in the exact window model, we show that approximate windows allow an exponential space reduction for constant $\tau$ .

Specifically, for $\tau=\Theta(1)$ , we present a space lower bound of $\Omega(\log(RW))$ bits. Additionally, we show an $\Omega(\log\left({W/\epsilon}\right))$ lower bound for $RW\epsilon$ additive approximations and a $\Omega(\log\left({W/\epsilon}\right)+\log\log{R})$ bits lower bound for $(1+\epsilon)$ multiplicative approximations. Our work is the first to study this problem in the exact and additive approximation settings. For all settings, we provide memory optimal algorithms that operate in worst case constant time. This strictly improves on the work of [16] for $(1+\epsilon)$ -multiplicative approximation that requires $O(\epsilon^{-1}\log\left({RW}\right)\log\log\left({RW}\right))$ space and performs updates in $O(\log\left({RW}\right))$ worst case time. Finally, we show asymptotic improvements for the Count-Distinct, General-Summing and Maximum problems.

1 Introduction

Network algorithms in diverse areas such as traffic engineering, load balancing and quality of service [2, 9, 26, 31, 38] rely on timely link measurements. In such applications recent data is often more relevant than older data, motivating the notions of aging and sliding window [6, 11, 18, 32, 34]. For example, a sudden decrease in the average packet size on a link may indicate a SYN attack [33]. Additionally, a load balancer may benefit from knowing the current utilization of a link to avoid congestion [2].

While conceptually simple, conveying the necessary information to network algorithms is a difficult challenge due to current memory technology limitations. Specifically, DRAM memory is abundant but too slow to cope with the line rate while SRAM memory is fast enough but has a limited capacity [10, 15, 36]. Online decisions are therefore realized through space efficient data structures [7, 8, 19, 20, 5, 30, 35, 37] that store measurement statistics in a concise manner. For example, [19, 35] utilize probabilistic counters that only require $O(\log\log N)$ bits to approximately represent numbers up to $N$ . Others conserve space using variable sized counter encoding [20, 30] and monitoring only the frequent elements [6].

Basic-Summing is one of the most basic textbook examples of such approximated sliding window stream processing problems [16]. In this problem, one is required to keep track of the sum of the last $W$ elements, when all elements are non-negative integers in the range $\left\{0,1,\ldots,R\right\}$ . The work in [16] provides a $(1+\epsilon)$ -multiplicative approximation of this problem using $O\left({\frac{1}{\epsilon}\cdot\left({\log^{2}W+\log R\cdot\left({\log W+\log\log R}\right)}\right)}\right)$ bits. The amortized time complexity is $O(\frac{\log R}{\log W})$ and the worst case is $O(\log W+\log R)$ . In contrast, we previously showed an $RW\epsilon$ -additive approximation with $\Theta\left(\frac{1}{\epsilon}+\log W\epsilon\right)$ bits [4].

Sliding window counters (approximated or accurate) require asymptotically more space than plain stream counters. Such window counters are prohibitively large for networking devices which already optimize the space consumption of plain counters.

This paper explores the concept of slack, or approximated sliding window, bridging this gap. Figure 1 illustrates a “window” in this model. Here, each query may select a $\tau$ -slack window whose size is between $W$ (the green elements) and $W(1+\tau)$ (the green plus yellow elements). The goal is to compute the sum with respect to this chosen window.

Slack windows were also considered in previous works [16, 34] and we call the problem of maintaining the sum over a slack window SS. Datar et al. [16] showed that constant slack reduces the required memory from $O({\frac{1}{\epsilon}\cdot\left({\log^{2}W+\log R\cdot\left({\log W+\log\log R}\right)}\right)})$ to $O(\epsilon^{-1}\log(RW)\log\log(RW))$ . For $\tau$ -slack windows they provide a $(1+\epsilon)$ -multiplicative approximation using $O(\epsilon^{-1}\allowbreak\log(RW)(\log\log(RW)+\log\tau^{-1}))$ bits.

Our Contributions

This paper studies the space and time complexity reductions that can be attained by allowing slack – an error in the window size. Our results demonstrate exponentially smaller and asymptotically faster data structures compared to various problems over exact windows. We start with deriving lower bounds for three variants of the Basic-Summing problem – when computing an exact sum over a slack window, or when combined with an additive and a multiplicative error in the sum. We present algorithms that are based on dividing the stream into $W\tau$ -sized blocks. Our algorithms sum the elements within each block and represent each block’s sum in a cyclic array of size $\tau^{-1}$ . We use multiple compression techniques during different stages to drive down the space complexity. The resulting algorithms are space optimal, substantially simpler than previous work, and reduce update time to $O(1)$ .

For exact SS, we present a lower bound of $\Omega(\tau^{-1}\log(RW\tau))$ bits. For $(1+\epsilon)$ multiplicative approximations we prove an $\Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)}$ bits bound when $\tau=\Omega\left({1\over\log{RW}}\right)$ . We show that $\Omega(\tau^{-1}\log\left\lfloor{1+\tau/\epsilon}\right\rfloor+\log\left({W/\epsilon}\right))$ bits are required for $RW\epsilon$ additive approximations.

Next, we introduce algorithms for the SS problem, which asymptotically reduce the required memory compared to the sliding window model. For the exact and additive error versions of the problem, we provide memory optimal algorithms. In the multiplicative error setting, we provide an $O\big{(}\tau^{-1}\left({\log{\epsilon^{-1}}+\log\log\left({RW\tau}\right)}\right)+\log(RW)\big{)}$ space algorithm. This is asymptotically optimal when $\tau=\Omega(\log^{-1}{W})$ and $R=\text{poly}(W)$ . It also asymptotically improves [16] when $\tau^{-1}=o(\epsilon^{-1}\log\left({RW}\right))$ . We further provide an asymptotically optimal solution for constant $\tau$ , even when $R=W^{\omega(1)}$ . All our algorithms are deterministic and operate in worst case constant time. In contrast, the algorithm of [16] works in $O(\log RW)$ worst case time.

To exemplify our results, consider monitoring the average bandwidth (in bytes per second) passed through a router in a $24$ hours window, i.e., $W\triangleq 86400$ seconds. Assuming we use a 100GbE fiber transceiver, our stream values are bounded by $R\approx 2^{34}$ bytes. If we are willing to withstand an error of $\epsilon=2^{-20}$ (i.e., about $16\mathit{KBps}$ ), the work of [4] provides an additive approximation over the sliding window and requires about 120KB. In contrast, using a 10 minutes slack ( $\tau\triangleq\frac{1}{144}$ ), our algorithm for exact SS requires only 800 bytes, 99% less than approximate summing over exact sliding window. For the same slack size, the algorithm of [16] requires more space than our exact algorithm even for a large 3% error. Further, if we also allow the same additive error ( $\epsilon=2^{-20}$ ), we provide an algorithm that requires only 240 bytes - a reduction of more than $99.8\%$ !

Table 1 compares our results for the important case of constant slack with [16]. As depicted, our exact algorithm is faster and more space efficient than the multiplicative approximation of [16]. Comparing our multiplicative approximation algorithm to that of [16], we present exponential space reductions in the dependencies on $\epsilon^{-1}$ and $R$ , with an asymptotic reduction in $W$ as well. We also improve the update time from $O(\log\left({RW}\right))$ to $O(1)$ .

Finally, we apply the slack window approach to multiple streaming problems, including Maximum, General-Summing, Count-Distinct and Standard-Deviation. We show that, while some of these problems cannot be approximated on an exact window in sub-linear space (e.g. maximum and general sum), we can easily do so for slack windows. In the count distinct problem, a constant slack yields an asymptotic space reduction over [11, 24].

2 Preliminaries

For $\ell\in\mathbb{N}$ , we denote $[\ell]\triangleq\left\{0,1,\ldots,\ell\right\}$ . We consider a stream of data elements $x_{1},x_{2},\ldots,x_{t}$ , where at each step a new element $x_{i}\in[R]$ is added to $S$ . A $W$ -sized window contains only the last $W$ elements: $x_{t-W+1}\ldots x_{t}$ . We say that $\mathcal{F}$ is a $\tau$ -slack $W$ -sized window if there exists $c\in[W\tau-1]$ such that $\mathcal{F}=x_{t-(W+c)+1}\ldots x_{t}$ . For simplicity, we assume that $\tau^{-1}$ and $W\tau$ are integers. Unless explicitly specified, the base of all logs is $2$ .

Algorithms for the SS problem are required to support two operations:

Update $(x_{t})$ Process a new element $x_{t}\in[R]$ . 2. 2.

Output $()$ Return a pair $\langle\widehat{S},c\rangle$ such that $c\in\mathbb{N}$ is the slack size and $\widehat{S}$ is an estimation of the last $W+c$ elements sum, i.e., $S\triangleq\sum_{k=t-(W+c)+1}^{t}x_{k}$ .

We consider three types of algorithms for SS:

Exact algorithms: an algorithm $\mathbb{A}$ solves $(W,\tau)$ -Exact Summing if its Output returns $\langle\widehat{S},c\rangle$ that satisfies $0\leq c<W\tau$ and $\widehat{S}=S$ . 2. 2.

Additive algorithms: we say that $\mathbb{A}$ solves $(W,\tau,\epsilon)$ -Additive Summing if its Output function returns $\langle\widehat{S},c\rangle$ that satisfies $0\leq c<W\tau$ and $|S-\widehat{S}|<RW\epsilon$ . 3. 3.

Multiplicative algorithms: $\mathbb{A}$ solves $(W,\tau,\epsilon)$ -Multiplicative Summing if its Output returns $\langle\widehat{S},c\rangle$ satisfying $0\leq c<W\tau{}$ and $\frac{S}{1+\epsilon}<\widehat{S}\leq S$ if $S>0$ , and $\widehat{S}=0$ otherwise.

3 Lower Bounds

In this section, we analyze the space required for solving the SS problems. Intuitively, our bounds are derived by constructing a set of inputs that any algorithm must distinguish to meet the required guarantees. There are two tricks that we frequently use in these lower bounds. The first is setting the input such that the slack consists only of zeros, and thus the algorithm must return the desired approximation of the remaining window. The next is using a “cycle argument” – consider two inputs $x$ and $x\cdot y$ for $x,y\in\left\{0,1,\ldots,R\right\}^{*}$ . If both lead to the same memory configuration, so do such $xy^{k}$ for any $k\in\mathbb{N}$ . Thus, if there is a $k$ such that no single answer approximates $x$ and $xy^{k}$ well, then $x$ and $xy$ had to lead to separate memory configurations in the first place.

3.1 $(W,\tau)$ -Exact Summing

We start by proving lower bounds on the memory required for exact SS.

Lemma 3.1.

Any deterministic algorithm

We now use Lemma 3.1, whose proof is deferred to Appendix A, to show the following lower bound on $(W,\tau)$ -Exact Summing algorithms:

Theorem 3.2.

Any deterministic algorithm

Proof 3.3.

*Lemma 3.1 shows a $\left\lfloor{\log\left({RW^{2}}\right)}\right\rfloor$ bound. We proceed with showing a lower bound

$\left\lceil{\left\lceil{\tau^{-1}/2}\right\rceil{\log\left({RW\tau+1}\right)}}\right\rceil$ bits. Consider the following languages:*

[TABLE]

Notice that $|\overline{L_{E_{2}}}|=(RW\tau+1)^{\left\lceil{\tau^{-1}/2}\right\rceil}$ since each of the words in $L_{E_{2}}$ has a distinct sum of literals, and each number in $\left\{0,1,\ldots,RW\tau\right\}$ is the sum of a word. We show that each input in $\overline{L_{E_{2}}}$ must be mapped into a distinct memory configuration. Let $S_{1}\triangleq w_{1,1}\cdot w_{2,1}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,1}$ , $\quad S_{2}\triangleq w_{1,2}\cdot w_{2,2}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,2}$ be two distinct inputs in $\overline{L_{E_{2}}}$ such that $\forall i:w_{i,1},w_{i,2}\in L_{E_{2}}$ . Denote $\chi\triangleq\max\left\{i\in\left[\left\lceil{\tau^{-1}/2}\right\rceil\right]\mid w_{i,1}\neq w_{i,2}\right\}$ – the last place in which $S_{1}$ differs from $S_{2}$ ; also, denote $w_{\chi,1}\triangleq 0^{W\tau}a,w_{\chi,2}\triangleq 0^{W\tau}b$ . Consider the sequences $S^{*}_{1}=S_{1}\cdot 0^{2W\tau(\chi-1/2)}$ and $S^{*}_{2}=S_{2}\cdot 0^{2W\tau(\chi-1/2)}$ . Notice that the last $W$ elements windows for $S^{*}_{1},S^{*}_{2}$ are $a\cdot w_{\chi+1,1}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,1}\cdot 0^{2W\tau(\chi-1/2)}$ and $b\cdot w_{\chi+1,2}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,2}\cdot 0^{2W\tau(\chi-1/2)}$ respectively, and that the preceding $W\tau$ elements of both are all zeros. An illustration of the setting appears in Figure 2.

By our choice of $\chi$ , we have that the sum of the last $W$ elements of $S_{1}^{*}$ and $S_{2}^{*}$ is different, and since the slack is all zeros, no answer is correct on both. Finally, note that this implies that $S_{1},S_{2}$ had to reach different configurations, as otherwise

3.2 $(W,\tau,\epsilon)$ -Additive Summing

Next, Theorem 3.4 shows a lower bound for additive approximations of SS. Due to lack of space, the proof is deferred to Appendix B.

Theorem 3.4.

For $\epsilon<1/4$ , any deterministic algorithm

3.3 $(W,\tau,\epsilon)$ -Multiplicative Summing

In this section, we show lower bounds for multiplicative approximations of SS. We start with Lemma 3.5, whose proof appears in Appendix C.

Lemma 3.5.

For $\epsilon<1/4$ , any deterministic algorithm

To extend our multiplicative lower bound, we use the following fact:

Fact 1.

For any $x\neq 1,y\in\mathbb{R}$ , the sequence $\left\{c_{i}\right\}_{i=1}^{\infty}$ , defined as $c_{n}\triangleq\begin{cases}1&\mbox{n = 1}\\ x\cdot c_{n-1}+y&\mbox{Otherwise}\end{cases}$

can be represented using a closed form as $c_{n}=x^{n-1}+y\cdot\frac{x^{n}-1}{x-1}$ .

Next, let $k\in\mathbb{N}$ and $\psi,\epsilon\in\mathbb{R}$ , such that $\psi\geq 2$ , $\epsilon>0$ , $k\geq 1$ ; consider the integer sequence

[TABLE]

Using the fact above, we show the following lemma:

Lemma 3.6.

For every integer $n\geq 1$ we have $a_{n,k}\leq 4\epsilon^{-1}{(1+\epsilon)^{n+1}}{\psi^{k-1}}$ .

Proof 3.7.

To apply Fact 1, we define an upper bounding sequence $\left\{b_{i,k}\right\}_{i=1}^{\infty}$ as follows:

[TABLE]

Thus, we can rewrite the $n$ ’th element of the sequence as:

[TABLE]

We can now use this representation to derive an upper bound of $b_{n,k}$ :

[TABLE]

Finally, since $a_{n,k}\leq b_{n,k}$ for any $n,k$ , we conclude that $a_{n,k}\leq 4\epsilon^{-1}{(1+\epsilon)^{n+1}}{\psi^{k-1}}$ .

We now define the integer set $I_{k}$ as $I_{k}\triangleq\left\{a_{n,k}\mid a_{n,k}\leq\psi^{k}\right\}$ , and proceed to bound $|I_{k}|$ .

Lemma 3.8.

For any $k\geq 1$ we have $|I_{k}|\geq\epsilon^{-1}\ln\left({{\psi\epsilon/4}}\right)-1$ .

Proof 3.9.

Clearly, the cardinality of $I_{k}$ is the largest $n$ for which $a_{n,k}\leq\psi^{k}$ . According to Lemma 3.6, we have that $a_{n,k}\leq 4\epsilon^{-1}{(1+\epsilon)^{n+1}}{\psi^{k-1}}$ , and thus:

[TABLE]

We proceed with a stronger lower bound for non-constant $\tau$ values.

Lemma 3.10.

For $\frac{1}{2\log\left({RW}\right)-8}\leq\tau\leq 1$ , any deterministic algorithm

Proof 3.11.

We use $rep(x)\triangleq(x\mod R)\cdot R^{\left\lfloor{x/R}\right\rfloor}$ to denote a sequence in $\left\{\sigma R^{*}\mid\sigma\in[R]\right\}$ that has a sum of $x$ . For an integer set $I_{k}$ , we denote $rep(I_{k})\triangleq\left\{rep(x)\mid x\in I_{k}\right\}$ . We now choose the value of $\psi$ to be $\psi\triangleq\sqrt[\left\lceil{\tau^{-1}/2}\right\rceil]{RW/8}$ ; notice that $\psi\geq 2$ as required. Next, consider:

[TABLE]

That is, every word in the $\overline{L_{M,2}}$ language consists of a concatenation of words $w_{1},\ldots,w_{\left\lceil{\tau^{-1}/2}\right\rceil}$ , such that every $w_{i}$ starts with $W\tau$ zeros followed by a string representing an integer in $I_{\left\lceil{\tau^{-1}/2}\right\rceil+1-i}$ , which is defined above. According to Lemma 3.8 we have that

[TABLE]

Next, we show that every two words in $\overline{L_{M,2}}$ must reach different memory configurations, thereby implying a $\Omega\left({\log\left({|\overline{L_{M,2}}|}\right)}\right)$ bits lower bound. Let $S_{1}\neq S_{2}\in\overline{L_{M,2}}$ such that $S_{1}=0^{W}\cdot w_{1,1}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,1}$ , $S_{2}=0^{W}\cdot w_{1,2}\cdots w_{\left\lceil{\tau^{-1}/2}\right\rceil,2}$ , and $\forall i\in\left\{1,\ldots,\left\lceil{\tau^{-1}/2}\right\rceil\right\}j\in\left\{1,2\right\}:w_{i,j}\in\left\{0^{W\tau}\cdot rep(x)\mid x\in I_{\left\lceil{\tau^{-1}/2}\right\rceil+1-i}\right\}$ . We next assume by contradiction that $S_{1}$ and $S_{2}$ leads

Finally, we combine Lemma 3.5 and Lemma 3.10 to obtain the following lower bound:

Theorem 3.12.

For $\epsilon<1/4,\frac{1}{2\log\left({RW}\right)-8}\leq\tau\leq 1$ , any deterministic algorithm for the $(W,\tau,\epsilon)$ -Multiplicative Summing problem requires at least $\Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)}$ bits.

4 Upper Bounds

In this section, we introduce solutions for the SS problems. In general, all our algorithms have a structure that consists of a subset of the following, where “compression” has a different meaning for the exact, additive and multiplicative variants:

•

Compress the arriving item.

•

Add the item into a counter $y$ and compress the counter.

•

If a $W\tau$ -sized block ends, store it as a compressed representation of $y$ . Sometimes we propagate the compression error to the following block; otherwise, we zero $y$ .

•

Use the block values and $y$ to construct an estimation for the sum.

Our double rounding technique, described below, asymptotically improves over running $1/\tau$ separate plain stream (insertion only) algorithm instances.

4.1 $(W,\tau)$ -Exact Summing

We divide the stream into ${W\tau}$ -sized blocks and sum the number of arriving elements in each block with a $\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil$ bits counter. We maintain the sum of the current block in a variable called $y$ , $c$ maintains the number of elements within the current block, and $i$ is the current block number. The variable $b$ is a cyclic buffer of $\tau^{-1}$ blocks. Every $W\tau$ steps, we assign the value of $y$ to the oldest block ( $b_{i}$ ) and increment $i$ . Intuitively, we “forget” $b_{i}$ when its block is no longer part of the window. To satisfy queries in constant time, we also maintain the sum of all active counters in a $\left\lceil{\log\left({RW(1+\tau)+1}\right)}\right\rceil$ -bits variable named $B$ . Algorithm 1 provides pseudocode for the described algorithm.

We now analyze the memory consumption of Algorithm 1.

Theorem 4.1.

Algorithm 1 uses $(\tau^{-1}+1)\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil+\log\left({RW^{2}}\right)+O(1)$ bits.

Proof 4.2.

$y$ * takes $\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil$ bits; $B$ requires $\left\lceil{\log\left({RW+1}\right)}\right\rceil$ ; $i$ adds $\left\lceil{\log{\tau^{-1}}}\right\rceil$ bits, while $c$ needs $\left\lceil{\log{W\tau}}\right\rceil$ bits. Finally, $b$ is a $\tau^{-1}$ -sized array of counters, each allocated with $\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil$ bits. Overall, it uses $(\tau^{-1}+1)\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil+\log\left({RW^{2}}\right)+4$ bits.*

We conclude that Algorithm 1 is asymptotically optimal.

Theorem 4.3.

Let $\mathcal{B}\triangleq{\max\left\{\left\lfloor{\log\left({RW^{2}}\right)}\right\rfloor,\left\lceil{\left\lceil{\tau^{-1}/2}\right\rceil{\log\left({RW\tau+1}\right)}}\right\rceil\right\}}$ be the $(W,\tau)$ -Exact Summing lower bound of Theorem 3.2. Algorithm 1 uses at most $\mathcal{B}(4+o(1))$ memory bits.

Theorem 4.3 shows that Algorithm 1 is only x $4$ larger than the lower bound. In Appendix D we show that in some cases we can get considerably closer to the lower bound. Finally, in Appendix E we show that Algorithm 1 is correct.

4.2 $(W,\tau,\epsilon)$ -Additive Summing

We now show that additional memory savings can be obtained by combining slackness with an additive error. First, we consider the case where $\tau\leq 2\epsilon$ . In [4], we proposed an algorithm that sums over (exact) $W$ elements window using the optimal $\Theta(\epsilon^{-1}+\log W)$ bits, with an additive error of $RW\epsilon$ . Next, notice that if an algorithm solves $(W,\tau,\epsilon)$ -Additive Summing, it also solves $(W,\tau,\tau/2)$ -Additive Summing; hence, we can apply Theorem 3.4 to conclude that it requires $\Omega(\tau^{-1}+\log W)=\Omega(\epsilon^{-1}+\log W)$ . Thus, we can run the algorithm from [4] and remain asymptotically memory optimal with no slack at all!

Henceforth, we assume that $\tau>2\epsilon$ ; we present an algorithm for the problem using a $2$ -stage rounding technique. When a new item arrives, we scale it by $R$ and then round the results to $O(\log\epsilon^{-1})$ bits. As in Section 4.1, we break the stream into non-overlapping blocks of size $W\tau$ and compute the sum of each block separately. However, we now sum the rounded values rather than the exact input, with a $O(\log\frac{W\tau}{\epsilon})$ -bits counter denoted $y$ . Once the block is completed, we round its sum such that it is represented with $O(\log\frac{\tau}{\epsilon})$ bits. Note that this second rounding is done for the entire block’s sum while we still have the “exact” sum of rounded fractions. Thus, we propagate the second rounding error to the following block. An illustration of our algorithm appears in Figure 3. Here, $\text{Round}_{\upsilon}(z)$ refers to rounding a fractional number $z\in[0,1]$ into the closest number $\widetilde{z}$ such that $2^{\upsilon}\cdot\widetilde{z}\in\mathbb{N}$ . Algorithm 2 provides pseudo code for the algorithm, which uses the following variables:

$y$ - a fixed point variable that uses $\left\lceil{\log{W\tau}}\right\rceil+1$ bits to store its integral part and additional $\upsilon_{1}\triangleq\left\lceil{\log\epsilon^{-1}}\right\rceil+1$ bits for storing the fractional part. 2. 2.

$b$ - a cyclic array that contains $\tau^{-1}$ elements, each of which takes $\upsilon_{2}\triangleq\left\lceil{\log\frac{\tau}{\epsilon}}\right\rceil$ bits. 3. 3.

$B$ - keeps the sum of elements in $b$ and is represented using $\log\left({\tau^{-1}\left\lceil{\log\frac{\tau}{\epsilon}}\right\rceil+1}\right)$ bits. 4. 4.

$i$ - the index variable used for tracking the oldest block in $b$ . 5. 5.

$c$ - a variable that keeps the offset within the ${W\tau}$ sized block.

We now analyze the memory consumption of Algorithm 2.

Theorem 4.4.

Algorithm 2 uses $\tau^{-1}\log\left({\frac{\tau}{\epsilon}}\right)(1+o(1))+2\log(W/\epsilon)$ bits.

Proof 4.5.

$y$ * requires $\log\left({\frac{W\tau}{\epsilon}}\right)+O(1)$ bits; $b$ requires another $\tau^{-1}\left\lceil{\log\left({\frac{\tau}{\epsilon}}\right)}\right\rceil$ ; $B$ takes additional $\log\left({\tau^{-1}\left\lceil{\log\frac{\tau}{\epsilon}}\right\rceil+1}\right)$ bits; $i$ adds $\left\lceil{\log\tau^{-1}}\right\rceil$ bits, while and $c$ is represented with $\left\lceil{\log{W\tau}}\right\rceil$ bits. Overall, the space requirement is $\tau^{-1}\left\lceil{\log\left({\frac{\tau}{\epsilon}}\right)}\right\rceil(1+o(1))+2\log(W/\epsilon)$ bits.*

Corollary 4.6.

Let $\mathcal{B}\triangleq\max\left\{{\log(W/\epsilon)}-O(1),\left\lceil{\left\lceil{\tau^{-1}/2}\right\rceil{\log\left\lfloor{{\tau/2\epsilon+1}}\right\rfloor}}\right\rceil\right\}$ be the $(W,\tau,\epsilon)$ -Additive Summing space lower bound of Theorem 3.4, then Algorithm 2 uses $\mathcal{B}\cdot\left({4+o(1)}\right)$ bits.

Finally, Theorem 4.7 shows that Algorithm 2 is correct. The proof is deferred to Appendix F

Theorem 4.7.

Algorithm 2 solves the $(W,\tau,\epsilon)$ -Additive Summing problem.

4.3 $(W,\tau,\epsilon)$ -Multiplicative Summing

In this section, we present Algorithm 3 that provides a $(1+\epsilon)$ multiplicative approximation of the SS problem. Compared to Algorithm 1, we achieve a space reduction by representing each sum of $W\tau$ elements using $O(\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}})$ bits. Specifically, when a block ends, if its sum was $y$ , we store $\rho=\left\lfloor{\log_{(1+\epsilon/2)}y}\right\rfloor$ (we allow a value of $-\infty$ for $\rho$ if $y=0$ ). To achieve $O(1)$ Output, we also store an approximate window sum $B$ , which is now a fixed point fractional variable with $O(\log RW)$ bits for its integral part and additional $O(\log\epsilon^{-1})$ bits for storing a fraction. To update $B$ ’s value for a new $\rho$ , we round down the value of ${(1+\epsilon)^{\rho}}$ . Specifically, for a real number $x$ , we denote $\left({x}\right)_{\downarrow}\triangleq\left\lfloor{x\cdot k}\right\rfloor/k$ , for $k\triangleq\left\lceil{4\over\epsilon}\right\rceil$ . Our pseudo code appears in Algorithm 3. The algorithm requires $O\big{(}\tau^{-1}\left({\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}}}\right)\allowbreak+\log RW\big{)}$ bits of space and is memory optimal when $R=W^{O(1)}$ and $\tau=\Omega\left({\frac{1}{\log RW}}\right)$ . The full analysis of Algorithm 3 is deferred to Appendix G.

Next, we present an alternative $(W,\tau,\epsilon)$ -Multiplicative Summing algorithm that achieves optimal space consumption for $\tau=\Theta(1)$ , regardless of the value of $R$ .

Improved $(W,\tau,\epsilon)$ -Multiplicative Summing for $\tau=\Theta(1)$

Algorithm 4 is more space efficient than Algorithm 3 but has a query time of $O(\tau^{-1})$ . For $\tau=\Theta(1)$ , Algorithm 4 is memory optimal and supports constant time queries even if $R=W^{\omega(1)}$ ; for this case, Algorithm 3 requires $\Omega(\log R)$ bits which is sub optimal.

Intuitively, we shave the $\Omega\left({\log R}\right)$ bits from the space requirement of Algorithm 3 using an approximate representation for our $y$ variable and by not keeping the $B$ variable that allowed $O(1)$ time queries regardless of the value of $\tau$ . To avoid using $\Omega\left({\log R}\right)$ bits in $y$ , we use a fixed point representation in which $O(\log\epsilon^{-1}+\log\log\left({RW\tau}\right))$ bits are allocated for its integral part and another $O(\log W\tau)$ for the fractional part. The goal of $y$ is still to approximate the sum of the elements within a block, but now we aim for the sum to be approximately $(1+\epsilon/3)^{y}$ . Whenever a block ends, we store only the integral part of $y$ in our cyclic array $b$ to save space. When queried, we compute an estimate for the sum using all of the values in $b$ , which makes our query procedure take $O(\log\tau^{-1})$ time. To use the fixed point structure of $y$ , we use the operator $\left({\cdot}\right)_{\underline{\overline{\Downarrow}}}$ that rounds a real number $x$ into $\left({x}\right)_{\underline{\overline{\Downarrow}}}\triangleq\left\lfloor{x\cdot W\tau}\right\rfloor/W\tau$ . We denote $\log_{(1+\epsilon/3)}\left({0}\right)=-\infty,\left({-\infty}\right)_{\underline{\overline{\Downarrow}}}=-\infty,\left\lfloor{-\infty}\right\rfloor=-\infty$ and $(1+\epsilon/3)^{-\infty}=0$ . In appendix H we prove the following theorem.

Theorem 4.8.

For $\tau=\Theta(1)$ , Algorithm 4 processes elements and answers queries in $O(1)$ time, uses $O(\log(W/\epsilon)+\log\log R)$ bits, and is asymptotically optimal.

4.4 The Mean of a Slack Window

For some applications there is value in knowing the mean of a slack window. For example, a load balancer may be interested in the average transmission throughput. In exact windows, the sum and the mean can be derived from each other as the window size is constant. In slack windows, the window size changes but our algorithms also return the current slack offset $0\leq c<W\tau$ . That is, by dividing $\widehat{S}$ by $W+c$ we get an estimation of the mean (we assume that stream size is larger than W). Specifically, Algorithm 1 provides the exact mean; Algorithm 2 approximates it with $R\epsilon$ additive error, while Algorithm 3 yields a $(1+\varepsilon)$ multiplicative approximation.

5 Other Measurements over Slack Windows

We now explore the benefits of the slack model for other problems.

Maximum**. ** While maintaining the maximum of a sliding window can be useful for applications such as anomaly detection [33, 26], tracking it over an exact window is often infeasible. Specifically, any algorithms for a maximum over an (exact) window must use $\Omega\left({W\log\left({R/W}\right)}\right)$ bits [16]. The following theorem, proved in Appendix I shows that we can get a much more efficient algorithm for slack windows. Observe the the following bounds match for $\tau$ values that are not too small ( $\tau=R^{\Omega\left({1}\right)-1}$ ).

Theorem 5.1.

Tracking the maximum over a slack window deterministically requires $O\left({\tau^{-1}\log R}\right)$ and $\Omega\left({\tau^{-1}\log R\tau}\right)$ bits.

Standard-Deviation**. ** Building on the ability of our summing algorithms to provide the size of the slack window that they approximate, we can compute standard deviations over slack windows. Intuitively, the standard deviation of the window can be expressed as

[TABLE]

there $\overline{W}$ is the slack window and $m_{\overline{W}}$ is its mean. We can then use two slack summing instances to track $\sum_{x\in\overline{W}}x^{2}$ and $m_{\overline{W}}=|\overline{W}|^{-1}\sum_{x\in\overline{W}}x$ . This gives us an algorithm that computes the exact standard deviation over slack windows using $O(\tau^{-1}\log\left({RW\tau}\right))$ space. Similarly, by using approximate rather than exact summing solutions we can compute a $(1+\epsilon)$ multiplicative approximation for the standard deviation using $O\big{(}\tau^{-1}\big{(}\log\epsilon^{-1}+\log\log\left({RW\tau}\right)\big{)}+\log W\big{)}$ bits, or an $R\epsilon$ -additive approximation using $O(\tau^{-1}\log\left({\frac{\tau}{\epsilon}}\right)+\log W)$ space. We expand on this further in Appendix J.

General-Summing**. ** General-Summing is similar to Basic-Summing, except that the integers can be in the range $\left\{-R,\ldots,R\right\}$ . That is, we now allow for negative elements as well. Datar et al. [16] proved that General Sum requires $\Omega(W)$ bits, even for $R=1$ and constant factor approximation. In contrast, our exact summing algorithm from section 4.1 trivially generalizes to General-Summing and allows exact solution over slack windows.

Count-Distinct**. ** Estimating the number of distinct elements in a stream is another useful metric. In networking, the packet header is used to identify different flows, and it is useful to know how many distinct of them are currently active. A sudden spike in the number of active flows is often an indication of a threat to the network. It may indicate the propagation of a worm or virus, port scans that are used to detect vulnerabilities in the system and even Distributed Denial of Service (DDoS) attacks [13, 21, 25].

Here, we have studied the memory reduction that can be obtained by following a similar flow to our summing algorithms – we break the stream into $W\tau$ sized blocks and run the state of the art approximation algorithm on each block separately. Luckily, count distinct algorithms are mergable [1]. That is, we can merge the summaries for each block to obtain an estimation of the number of distinct items in the union of the blocks. In Appendix K we show that this approach yields an algorithm with superior space and query time compared to the state of the art algorithms for counting distinct elements over sliding windows [11, 24]. Formally, we prove the following theorem.

Theorem 5.2.

For $\tau=\Theta(1)$ and any fixed $m>0$ , there exists an algorithm that uses $O(m)$ space, performs updates in constant time and answers queries in time $O(m)$ , such that the result approximates a window whose size is in $[W,W(1+\tau)]$ ; the resulting estimation is asymptotically unbiased and has a standard deviation of $\sigma=O(\frac{1}{\sqrt{m}})$ . State of the art approaches for exact windows [11, 24] require $O(m\log\left({W/m}\right))$ space and $O(m\log\left({W/m}\right))$ time per query for a similar standard deviation.

6 Discussion

In this work we have explored the slack window model for multiple streaming problems. We have shown that it enables asymptotic space and time improvements. Particularly, introducing slack enables logarithmic space exact algorithms for certain problems such as Maximum and General-Summing. In contract, these problems do not admit sub-linear space approximations in the exact window model. Even in problems that do have sub-linear space approximations such as Standard-Deviation and Count-Distinct, adding slack asymptotically improves the space requirement and allows for constant time updates.

Much of our work has focused on the classic Basic-Summing problem. Based on our findings, we argue that allowing a slack in the window size is an attractive approximation axis as it enables greater space reductions compared to an error in the sum. As an example, for a fixed $\epsilon$ value, computing a $(1+\epsilon)$ -multiplicative approximation requires $\Omega(\log\left({RW}\right)\log{W})$ space [16]. Conversely, a $(1+\tau)$ multiplicative error in the window size, for a constant $\tau$ , allows summing using $\Theta(\log\left({RW}\right))$ bits – same as in summing $W$ elements without sliding windows! Given that for exact windows randomized algorithms have the same asymptotic complexity as deterministic ones [4, 16], we expect randomization to have limited benefits for slack windows as well.

Appendix A Proof of Lemma 3.1

Proof A.1.

Consider the following language

[TABLE]

That is, $L_{E_{1}}$ contains a word with $W+W\tau$ consecutive zeros and the rest of the words in $L_{E_{1}}$ are composed of these components in this order:

•

$W\tau+i$ * zeros for some $i\in[W-1]$ .*

•

a non zero symbol $\sigma$ .

•

$W-i-1$ * repetitions of the maximal symbol ( $R$ ).*

•

$j$ * zeros for some $j\in[i]$ .*

Our lower bound stems from the observation that every word in $L_{E_{1}}$ must lead to a different state. The language size is: $|L_{E_{1}}|=1+\sum_{i=0}^{W-1}R(i+1)=1+RW(W+1)/2.$ Therefore, the number of required bits is at least: $\left\lceil{\log|L_{E_{1}}|}\right\rceil>\left(\log(RW^{2})-1\right)$ . Further, this number is an integer and therefore at least $\left\lfloor{\log(RW^{2})}\right\rfloor$ bits are required.

First, notice that the word composed of $W+W\tau$ zeros requires a unique configuration as $\mathbb{A}$ must return [math] after processing that word. In contrast, it must not return [math] after processing any other word as there is at least a single $R$ within the last $W$ elements.

Let $w_{1},w_{2}\in L_{E_{1}}$ be two different words that are not all-zeros. We need to show that $w_{1}$ and $w_{2}$ require different memory configuration.

By definition of $L_{E_{1}}$ , $w_{1}=0^{W\tau+i_{1}}\sigma_{1}R^{W-i_{1}-1}0^{j_{1}}$ and $w_{2}=0^{W\tau+i_{2}}\sigma_{2}R^{W-i_{2}-1}0^{j_{2}}$ . Observe that the last $W$ elements of $w_{1},w_{2}$ are $0^{i_{1}-j_{1}}\sigma_{1}R^{W-i_{1}-1}0^{j_{1}}$ and $0^{i_{2}-j_{2}}\sigma_{1}R^{W-i_{2}-1}0^{j_{2}}$ respectively and that both are preceded with at least $W\tau$ zeros. If $i_{1}\neq i_{2}$ or $\sigma_{1}\neq\sigma_{2}$ , then $\sigma_{1}+R\cdot\left({W-i_{1}-1}\right)\neq\sigma_{2}+R\cdot\left({W-i_{2}-1}\right)$ and thus $\mathbb{A}$ cannot return the same count for both, regardless of the slack, as it is all zeros ib both $w_{1}$ and $w_{2}$ .

Next, assume that $i_{1}=i_{2}$ , $\sigma_{1}=\sigma_{2}$ and that without loss of generality $j_{1}<j_{2}$ . This means that both $w_{1}$ and $w_{2}$ have the same count.

Since $j_{1}<j_{2}$ , $w_{1}$ is a strict prefix of $w_{2}$ , i.e., $w_{2}=w_{1}\cdot 0^{j_{2}-j_{1}}$ . Assume by contradiction that after processing $w_{1},w_{2}$

Appendix B Proof of Theorem 3.4

Before we prove Thorem 3.4, we start with a simpler lower bound.

Lemma B.1.

Let $\epsilon<1/4$ . Any deterministic algorithm that solves the $(W,\tau,\epsilon)$ -Additive Summing problem must use at least ${\log(W/\epsilon)}-O(1)$ bits.

Proof B.2.

Denote by $rep(x)\triangleq(x\mod R)\cdot R^{\left\lfloor{x/R}\right\rfloor}$ a sequence in $\left\{\sigma R^{*}\mid\sigma\in[R]\right\}$ whose sum is $x$ . Next, consider the following languages:

[TABLE]

First, notice that $|L_{A_{1}}|=\left\lfloor{1/4\epsilon}\right\rfloor$ and that all words in $L_{A_{1}}$ have length of at most $W/2$ . This means that $|\overline{L_{A_{1}}}|=\left\lfloor{1/4\epsilon}\right\rfloor\left\lfloor{W/2+1}\right\rfloor>\left\lfloor{W/8\epsilon}\right\rfloor$ .

We now show that every word in $\overline{L_{A_{1}}}$ must have a dedicated memory configuration, thereby implying a $\left\lceil{\log{\left\lfloor{W/8\epsilon}\right\rfloor}}\right\rceil$ bits bound. Let $w_{1}=0^{W+W\tau}\cdot x_{1}\cdot 0^{q_{1}}$ and $w_{2}=0^{W+W\tau}\cdot x_{2}\cdot 0^{q_{2}}$ be two distinct words in $\overline{L_{A_{1}}}$ such that $x_{1},x_{2}\in L_{A_{1}}$ and $q_{1},q_{2}\in\left\lfloor{W/2}\right\rfloor$ . If $x_{1}\neq x_{2}$ , then their most recent $W$ elements differ by more than $2RW\epsilon$ and there is no output that is correct for both. Note that the slack of both $w_{1}$ and $w_{2}$ is all zeros. Hence, $w_{1}$ and $w_{2}$ require different memory configurations.

Assume that $x_{1}=x_{2}$ and that by contradiction both $w_{1}$ and $w_{2}$ reached the same memory configuration. Since $w_{1}\neq w_{2}$ and $x_{1}=x_{2}$ , then $q_{1}\neq q_{2}$ and without loss of generality $q_{1}<q_{2}$ . This implies that $w_{1}$ is a prefix of $w_{2}$ so that $w_{2}=w_{1}\cdot 0^{q_{2}-q_{1}}$ . Thus, $\mathbb{A}$ enters the shared configuration after reading $w_{1}$ and revisits it after reading $0^{q_{2}-q_{1}}$ . $\mathbb{A}$ is a deterministic algorithm and therefore it reaches the same configuration also for the following word: $w_{1}\cdot 0^{(W+W\tau)(q_{2}-q_{1})}$ . In that word, the last $W+W\tau$ elements are all zeros while the sum of the last $W$ elements in $w_{1}$ is at least $2RW\epsilon$ . Hence, there is no return value that is correct for both $w_{1}$ and $w_{1}\cdot 0^{(W+W\tau)(q_{2}-q_{1})}$ .

We are now ready to prove Theorem 3.4. The theorem says that for $\epsilon<1/4$ , any deterministic algorithm

Algorithm 12.

*that solves the $(W,\tau,\epsilon)$ -Additive Summing problem requires

$\max\left\{{\log(W/\epsilon)}-O(1),\left\lceil{\left\lceil{\tau^{-1}/2}\right\rceil{\log\left\lfloor{{\tau/2\epsilon+1}}\right\rfloor}}\right\rceil\right\}$ bits.*

Proof B.3.

Lemma B.1 shows that

Appendix C Proof of Lemma 3.5

Before we prove Lemma 3.5 we first give a bound on an integer set that will serve us in the main lemma.

Lemma.

Consider the integer set $I_{M}\triangleq\left\{a_{n}\mid a_{n}\leq R\left\lfloor{W/2}\right\rfloor\right\}$ , where the integers $\{a_{i}\}$ are taken from the following sequence: $a_{1}=1,\forall n>1:a_{n}=\left\lceil{(1+\epsilon)a_{n-1}}\right\rceil$ . The cardinality of $I_{M}$ satisfies $\left|I_{M}\right|\geq\ln\left({2}\right)\epsilon^{-1}\log\left({RW\epsilon/2}\right)-O(1)$ .

Proof C.1.

We first show an upper bound on $a_{n}$ $\forall n\in\mathbb{N}:a_{n}\leq\epsilon^{-1}\cdot\left({(1+\epsilon)^{n}-1}\right)$ .

•

Basis*: for $n=1$ , we have $a_{1}=1=\epsilon^{-1}\cdot\left({(1+\epsilon)-1}\right)$ .*

•

Hypothesis:* $a_{n-1}\leq\epsilon^{-1}\cdot\left({(1+\epsilon)^{n-1}-1}\right)$ .*

•

Step:* For $n>1$ , we bound $a_{n}$ as follows:*

[TABLE]

Next, notice that this implies that $|I_{M}|\geq\arg\max\left\{n\mid\epsilon^{-1}(1+\epsilon)^{n}\leq R\left\lfloor{W/2}\right\rfloor\right\}$ . Finally, we get a lower bound of $n=\left\lfloor{\log_{1+\epsilon}(R\left\lfloor{W/2}\right\rfloor\epsilon)}\right\rfloor=\frac{\ln\left({RW\epsilon/2}\right)}{\ln(1+\epsilon)}-O(1)<\ln\left({2}\right)\epsilon^{-1}\allowbreak{\log\left({RW\epsilon/2}\right)}-O(1)$ , where the last inequality follows from the Taylor expansion of $\ln\left({1+\epsilon}\right)$ .

We are now ready to prove Lemma 3.5 using the lemma above.

Lemma.

For $\epsilon<1/4$ , any deterministic algorithm

Proof C.2.

We show a language $\overline{L_{M}}$ for which every two words must reach a unique memory configuration, thus implying a $\left\lceil{\log{|\overline{L_{M}}|}}\right\rceil$ bits lower bound. We denote by $rep(x)\triangleq(x\mod R)\cdot R^{\left\lfloor{x/R}\right\rfloor}$ a sequence in $\left\{\sigma R^{*}\mid\sigma\in[R]\right\}$ that has a sum of $x$ . We define $\overline{L_{M}}$ as follows:

[TABLE]

Notice that $|\overline{L_{M}}|=|{I_{M}}|\cdot\left({\left\lfloor{W/2}\right\rfloor+1}\right)$ and according to Lemma Lemma we have

[TABLE]

Consider two words $w_{1}=0^{W+W\tau}rep(x_{1})0^{j_{1}}$ and $w_{2}=0^{W+W\tau}rep(x_{2})0^{j_{2}}$ in $\overline{L_{M}}$ , such that $x_{1},x_{2}\in I_{M}$ . Notice that every two distinct numbers $z<q\in I_{M}$ satisfy $z\leq q/(1+\epsilon)$ . Since $w_{1},w_{2}$ are preceded with a sequence of $W\tau$ zeros, no answer correctly satisfies the requirements for both of them. Thus, if $x_{1}\neq x_{2}$ ,

Appendix D Tighter Analysis of the $(W,\tau)$ -Exact Summing Algorithm

Theorem D.1.

Consider a stream where $R=O(1)$ and $\tau=1$ . There exists a $(W,\tau)$ -Exact Summing algorithm that uses $1.5\mathcal{B}+O(1)$ bits, where $\mathcal{B}$ is the lower bound.

Proof D.2.

Our method here is similar to Algorithm 1, but the constant $\tau$ value allows us to compute $\sum_{i=1}^{\tau^{-1}}b_{i}$ in $O(1)$ without tracking it in $B$ . Thus, our algorithm only requires $3\log W+O(1)$ bits, while Theorem 3.2 gives a lower bound of $\left\lfloor{2\log W}\right\rfloor$ .

Appendix E Correctness Proof of Algorithm 1

Theorem E.1.

Algorithm 1 solves the $(W,\tau)$ -Exact Summing problem.

Proof E.2.

First, notice that $c$ is always in the range $\left\{0,1,\ldots,W\tau-1\right\}$ and thus the slack size is as needed. Next, assume that the algorithm input was $x_{1},x_{2},\ldots,x_{t}$ ; since stream is broken into blocks of size $W\tau$ , where $c$ is the offset within the current block, we have that $c=t\mod W\tau$ . Further, the last $c$ elements are $x_{t-c+1},\ldots,x_{t}$ and the preceding $W$ elements are $x_{t-c-W+1},\ldots,x_{t-c}$ . Finally, the algorithm always keep the sum of the $W$ elements before the current block in $B$ , and the sum of the current block in $y$ ; thus, by returning $\widehat{S}=B+y$ we get exactly the sum of the last $W+c$ elements.

Appendix F Correctness Proof of Algorithm 2

Theorem.

Algorithm 2 solves the $(W,\tau,\epsilon)$ -Additive Summing problem.

Proof F.1.

First, observe that at all times $c\in[W\tau-1]$ as needed. Denote the stream by $S=x_{1},x_{2},\ldots x_{t+c}$ , such that $c$ represents the number of elements within the current block. Our goal is to show that Algorithm 2 provides a $RW\epsilon$ approximation to the sum of the last $W+c$ elements ( $x_{t-W+1}\ldots x_{t+c}$ ). That is, the quantity we approximate is $S\triangleq\sum_{\ell=1}^{W+c}x_{t-W+\ell}.$

*For any $\ell\in[t+c]$ , we use $y_{\ell}$ to denote the value of the variable $y$ after the $\ell$ ’th item was added. Note that within a block, $y$ simply sums the rounded scaled inputs; whenever a block ends, we reduce the value of $y$ by $W\tau\cdot\text{Round}_{\upsilon_{2}}(\frac{y}{{W\tau}})$ , but make up for it by setting $b_{i}$ . Further, when processing $x_{t-W+1}\ldots x_{t+c}$ , we replace all of the values of $b$ that were determined before the last $W+c$ elements, and none of the set value leaves the window by time $t+c$ . That is, $i$ reaches every value in $[\tau^{-1}]$ exactly once throughout the last $W+c$ updates. This gives us the following equality: ${y_{t-W}+\sum_{i=1}^{W+c}x^{\prime}_{t-W+i}=W\tau\sum_{i=1}^{\tau^{-1}}b_{i}+y_{t+c}=W\tau\cdot B+y_{t+c}.}$

Thus, we can express the algorithm’s estimate of the sum value $\widehat{S}\triangleq R\cdot\left({W\tau\cdot B+y_{t+c}}\right)$ as:*

[TABLE]

Next, notice that since $\forall\ell:x_{\ell}^{\prime}=\text{Round}_{\upsilon_{1}}(\frac{x_{\ell}}{R})$ , we have $|x_{\ell}^{\prime}-\frac{x_{\ell}}{R}|\leq 2^{-\upsilon_{1}-1}$ and thus:

[TABLE]

Also, since we assumed that $x_{t-W}$ was the last of a $W\tau$ -sized block, we know that the value of $y$ is bounded, and specifically:

[TABLE]

Plugging (2) and (3) into (1), we get a bound on the error:

[TABLE]

*Thus, since $\upsilon_{1}=\left\lceil{\log\epsilon^{-1}}\right\rceil+1$ , $\upsilon_{2}=\left\lceil{\log\frac{\tau}{\epsilon}}\right\rceil$ , we get the desired $\mathcal{E}<RW\epsilon$ error bound and conclude that Algorithm 2 solves $(W,\tau,\epsilon)$ -Additive Summing. *

Appendix G Analysis of Algorithm 3

We now analyze the memory requirement of our algorithm.

Theorem G.1.

Algorithm 3 requires $O\left({\tau^{-1}\left({\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}}}\right)+\log(RW)}\right)$ bits.

Proof G.2.

*Since $\rho\in\left({\left\{-\infty\right\}\cup\left\{0,1,\ldots,\left\lfloor{\log_{(1+\epsilon/2)}RW\tau}\right\rfloor\right\}}\right)$ , it can be represented using

$\left\lceil{\log\left({\log_{(1+\epsilon/2)}(RW\tau)+2}\right)}\right\rceil$ bits. Next, notice that this satisfies:*

[TABLE]

Each counter in $b$ now is assigned a $\rho$ value and thus the overall space consumption of $b$ is $\tau^{-1}\left({\log\left({\log RW\tau}\right)+\log{\epsilon^{-1}}+O(1)}\right)$ bits. Our $B$ variable sums the values rather than the exact sum of each block and is a fixed point variable. We use $\left\lceil{\log\left({RW\tau+1}\right)}\right\rceil$ bits for its integral part and another $\left\lceil{\log\left({4\epsilon^{-1}}\right)}\right\rceil$ for its fractional part. Thus, the total number of bits required by the algorithm is

[TABLE]

We thus get that Algorithm 3 is optimal under some conditions; notice that this includes constant $\tau$ values.

Theorem G.3.

Let $\mathcal{B}=\Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)}$ be the $(W,\tau,\epsilon)$ -Multiplicative Summing lower bound showed in Theorem 3.12. Then for $\tau^{-1}=O\left({{\log W}}\right)$ and $R=W^{O(1)}$ , Algorithm 3 uses $O(\mathcal{B})$ bits.

Next, we prove that Algorithm 3 solves the problem. Recall that for a real number $x$ , we denote $\left({x}\right)_{\downarrow}\triangleq\left\lfloor{x\cdot k}\right\rfloor/k$ , for $k\triangleq\left\lceil{4\over\epsilon}\right\rceil$ .

Lemma G.4.

Let $\epsilon\leq 1/2$ ; for every $y\in\mathbb{R}^{+}$ such that $y>0$ the following inequality holds:

[TABLE]

Proof G.5.

Observe that we have

[TABLE]

Finally, since $y\geq 1$ and $\epsilon\leq 1/2$ , we get that $\frac{y}{(1+\epsilon/2)}-\epsilon/4\geq\frac{y}{1+\epsilon}$ .

Theorem G.6.

Algorithm 3 solves the $(W,\tau,\epsilon)$ -Multiplicative Summing problem.

Proof G.7.

First, observe that at all times $c\in[W\tau-1]$ as needed. Denote the stream by $S=x_{1},x_{2},\ldots x_{t+c}$ , such that $c$ represents the number of elements within the current block. We will show that Algorithm 3 provides a $(1+\epsilon)$ approximation to the sum of the last $W+c$ elements ( $x_{t-W+1}\ldots x_{t+c}$ ). That is, the quantity we approximate is

[TABLE]

Next, since we reset $y$ in every block (Line 9), we have that at the time of the query, $y=\sum_{\ell=t+1}^{t+c}x_{\ell}.$ Further, since every block is summed individually and then rounded in Line 6. We assume without loss of generality that $i=0$ at the time of the query, and $\forall\jmath\in\left[\tau^{-1}-1\right]$ we denote $S_{\jmath}\triangleq\sum_{\ell=t-W\tau(\tau^{-1}-\jmath)+1}^{t-W\tau(\tau^{-1}-\jmath-1)}x_{\ell}$ – the sum of the elements in block $\jmath$ . Therefore we have:

[TABLE]

That is each of the $\tau^{-1}$ blocks that precede the last $c$ elements is summed exactly (Line 3) and then we store its $(1+\epsilon/2)$ -based log in $b$ . Next, the $B$ variable stores the approximated values rounded down (Line 7), i.e.,

[TABLE]

Now, Lemma G.4 implies that

[TABLE]

We then get

[TABLE]

Also, note that if some $S_{j}=0$ , then we defined $b_{\jmath}$ as $-\infty$ and $(1+\epsilon/2)^{b_{\jmath}}$ as [math]. This means that if $S=0$ , then $\forall\jmath\in\left[\tau^{-1}-1\right]:S_{\jmath}=0,y=0$ and thus $\widehat{S}=0$ . Hereafter, assume that $S\neq 0$ . Next, we plug equations (4),(5),(6) into (7) to get:

[TABLE]

Similarly, we bound $\widehat{S}$ from below as follows:

[TABLE]

*We showed that in all cases where $S\neq 0$ we have $\frac{S}{1+\epsilon}<\widehat{S}\leq S$ if $S>0$ , thereby proving the theorem. *

Appendix H Analysis of Algorithm 4

In order to analyze Algorithm 4, we first note that by rounding down a real number to use $\left\lceil{\log W\tau}\right\rceil$ bits, as in the $\left({\cdot}\right)_{\underline{\overline{\Downarrow}}}$ operator, we introduce a rounding error of at most $-1/W\tau$ .

Observation H.1.

For any $\alpha\in\mathbb{R}^{+}:\alpha-1/W\tau<\left({\alpha}\right)_{\underline{\overline{\Downarrow}}}\leq\alpha$ .

Our approach in the analysis of Algorithm 4 is as follows:

We start with Lemma H.2 that shows that $(1+\epsilon/3)^{y}$ is a $(1+\epsilon/3)$ multiplicative approximation to the block’s sum. 2. 2.

Next, Lemma H.4 shows that we do not lose much by taking $\left\lfloor{y}\right\rfloor$ into our cyclic buffer $b$ (rather than $y$ itself). This allows us to reduce the memory requirement at the expense of slightly increasing the error. Specifically, we show that $(1+\epsilon/3)^{\left\lfloor{y}\right\rfloor}$ is a $(1+\epsilon)$ multiplicative approximation of the sum. 3. 3.

Then we proceed with Lemma H.6 that shows a $O(\epsilon^{-1}\log\left({RW\tau}\right))$ bound on $y$ . This allows us to bound the number of bits needed for the representation of its integral part. 4. 4.

Lemma H.8 analyzes the overall space requirement of Algorithm 4. 5. 5.

Next, Theorem H.10 shows we indeed solve $(W,\tau,\epsilon)$ -Multiplicative Summing. 6. 6.

Finally, Corollary H.12 concludes the optimality for constant $\tau$ .

Lemma H.2.

Let $x_{1},\ldots x_{W\tau}$ be the elements of a block summed in $y$ , then $\sum_{i=1}^{W\tau}x_{i}=0\implies y=-\infty$ and otherwise:

[TABLE]

Proof H.3.

We prove the lemma by showing that $\forall n\in\left\{0,1,\ldots,W\tau\right\}$ , after summing $x_{1},\ldots,x_{n}$ we have that if $\sum_{i=1}^{n}x_{i}=0$ then $y=-\infty$ and otherwise:

[TABLE]

The proof is done by induction where we denote by $y_{i}$ the value of $y$ after summing $x_{1},\ldots x_{i}$ .

•

Basis:* $n=0$ .*

Here we simply have $y=-\infty$ and the claim holds.

•

Induction hypothesis:* let $0<n<W\tau$ then if $\sum_{i=1}^{n}x_{i}=0$ then $y_{n}=-\infty$ and otherwise:*

[TABLE]

•

Induction step:* let $y_{n+1}\triangleq\left({\log_{(1+\epsilon/3)}\left({{x_{n+1}+(1+\epsilon/3)^{y_{n}}}}\right)}\right)_{\underline{\overline{\Downarrow}}}$ .*

We first consider the case where $x_{1}=\ldots=x_{n+1}=0$ . In this case we have $y_{n}=-\infty$ according to the induction hypothesis. Thus, we get

[TABLE]

Next, consider the case where $x_{1}=\ldots=x_{n}=0$ but $x_{n+1}>0$ . This also gives us $y_{n}=-\infty$ and thus:

[TABLE]

and

[TABLE]

Finally, consider the case where ${\sum_{i=1}^{n}x_{i}}>0$ , and according to the induction hypothesis:

[TABLE]

Thus, we bound $(1+\epsilon/3)^{y_{n+1}}$ as follows:

[TABLE]

Similarly, we can bound it from below:

[TABLE]

Lemma H.4.

Let $x_{1},\ldots x_{W\tau}$ be the elements of a block summed in $y$ , then $\sum_{i=1}^{W\tau}x_{i}=0\implies{{(1+\epsilon/3)^{\left\lfloor{y}\right\rfloor}}}=0$ and otherwise

[TABLE]

Proof H.5.

First, notice that Lemma H.2 implies that if $\sum_{i=1}^{W\tau}x_{i}=0$ then $y=-\infty$ and thus ${{(1+\epsilon/3)^{\left\lfloor{y}\right\rfloor}}}=0$ . If the sum is non-zero, the lemma implies:

[TABLE]

Thus, we have that:

[TABLE]

From below, we can bound it as follows:

[TABLE]

where the last inequality holds as $\epsilon\leq 1/2$ .

We now bound the value of $y$ in order to compute the memory requirements of Algorithm 4

Lemma H.6.

For any $n\in\left\{0,1,\ldots,W\tau\right\}$ , let $x_{1},\ldots x_{n}$ be elements of a block summed in $y$ (Line 3), then $y=O(\epsilon^{-1}\log\left({RW\tau}\right))$ .

Proof H.7.

According to Lemma H.2 we have that $(1+\epsilon/3)^{y}\leq\sum_{i=1}^{n}x_{i}\leq RW\tau$ . Thus we can get an upper bound on $y$ ’s value:

[TABLE]

We are now ready to compute the space requirement of Algorithm 4.

Lemma H.8.

Algorithm 4 uses $O\left({\tau^{-1}\left({\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}}}\right)+\log W}\right)$ space.

Proof H.9.

The algorithm uses four variables:

•

$y$ * – a fixed point variable with $O(\log W\tau)$ bits for its fractional part. The integral part of $y$ , according to Lemma H.6, can be represented using $O(\log\epsilon^{-1}+\log\log\left({RW\tau}\right))$ .*

•

$b$ * – a cyclic array with $\tau^{-1}$ entries, each of which can be represented using $O(\log\epsilon^{-1}+\log\log\left({RW\tau}\right))$ bits per Line 6.*

•

$i$ * – tracks the index of the current block and thus has $\tau^{-1}$ possible values and can be represented using $O(\log\tau^{-1})$ bits.*

•

$c$ * – the offset within the current block; it is represented using $O(\log W\tau)$ bits.*

Overall, we get that the memory requirement is as stated.

We now prove the correctness of Algorithm 4.

Theorem H.10.

Algorithm 4 processes elements in constant time, answer queries in $O(\tau^{-1})$ , uses $O\left({\tau^{-1}\left({\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}}}\right)+\log W}\right)$ space and solves the $(W,\tau,\epsilon)$ -Multiplicative Summing problem.

Proof H.11.

Let $x_{1},\ldots x_{W+c}$ be the elements we are trying to approximate. Observe that $\forall\jmath\in[\tau^{-1}-1]$ , the elements $x_{W\tau\cdot\jmath+1,\ldots W\tau\cdot(\jmath+1)}$ are summed together (Line 3) and then stored in some $b_{i}$ (Line 6). This means that, according to Lemma H.4, $(1+\epsilon/3)^{b_{i}}$ approximates their sum up to a multiplicative error of $1+\epsilon$ . Thus, we have that:

[TABLE]

Finally, according to Lemma H.2, we have that $y$ approximates the last $c$ elements and specifically:

[TABLE]

This allow us to conclude that: $\frac{\sum_{i=1}^{W+c}x_{i}}{1+\epsilon}<(1+\epsilon/3)^{y}+\sum_{i=1}^{\tau^{-1}}(1+\epsilon/3)^{b_{i}}\leq\sum_{i=1}^{W+c}x_{i}.$

Corollary H.12.

For $\tau=\Theta(1)$ , Algorithm 4 answer queries in $O(1)$ time, uses $O(\log(W/\epsilon)+\log\log R)$ bits, and is asymptotically optimal.

Appendix I Proof of Theorem 5.1

Theorem.

Tracking the maximum over a slack window deterministically requires $O\left({\tau^{-1}\log R}\right)$ and $\Omega\left({\tau^{-1}\log R\tau}\right)$ bits.

Proof I.1.

The algorithm we propose is quite simple – compute the maximum over each $W\tau$ -sized block and keep a cyclic buffer the last $\tau^{-1}$ blocks’ maxima. Then, we can compute the maximum of the cyclic buffer at query time to get the maximal value in the slack window. For a lower bound, consider the following language:

[TABLE]

We now claim that each pair of distinct words $w_{1},w_{2}\in L_{\max}$ must lead the algorithm into a distinct memory configuration. Denote $w_{1}=0^{W-2W\tau\left\lfloor{1/2\tau}\right\rfloor}\sigma_{1,1}^{2W\tau}\ldots\sigma_{1,\left\lfloor{1/2\tau}\right\rfloor}^{2W\tau}$ , $w_{2}=0^{W-2W\tau\left\lfloor{1/2\tau}\right\rfloor}\sigma_{2,1}^{2W\tau}\ldots\sigma_{2,\left\lfloor{1/2\tau}\right\rfloor}^{2W\tau}$ and let $t\triangleq\max\left\{i\mid\sigma_{1,i}\neq\sigma_{2,i}\right\}$ . Without loss of generality assume that $\sigma{1,t}>\sigma_{2,t}$ . If $w_{1}$ and $w_{2}$ lead the algorithm into the same memory configuration, then it (being deterministic) must reach the same configuration again and provide the same output for $w_{1}\cdot 0^{W-t\cdot 2W\tau}$ and $w_{2}\cdot 0^{W-t\cdot 2W\tau}$ . But as the maximum in $w_{1}\cdot 0^{W-t\cdot 2W\tau}$ must be $\sigma_{1,t}$ (regardless of the chosen slack), and $\sigma_{1,t}>\sigma_{2,t}\geq\sigma_{2,t+1}\geq\ldots\geq\sigma_{2,\left\lfloor{1/2\tau}\right\rfloor}$ , no single answer is correct for both. Thus, the algorithm must reach a distinct memory configuration for each input, implying a lower bound of $\log_{2}|L_{\max}|=\log_{2}{\left\lfloor{1/2\tau}\right\rfloor+R\choose R}\geq\left\lfloor{1/2\tau}\right\rfloor\log\left({R/\left\lfloor{1/2\tau}\right\rfloor}\right)=\Omega\left({\tau^{-1}\log R\tau}\right)$ bits.

Appendix J Standard deviation over sliding windows

Here, we present our for the standard deviation over a window. While algorithms for the sum of a sliding window are known, to the best of our knowledge, no previous solution computes the standard deviation over an (approximate) sliding window.

We denote by $\overline{W}$ the set of items included in the window. The algorithm uses a window summing algorithm $\mathbb{A}$ as a black box. Here, $\mathbb{A}$ can be Algorithm 1, Algorithm 2, or Algorithm 3. We assume that $\mathbb{A}$ supports two operations:

I.

Update $(x)$ * – process a new element $x\in\left\{0,1,\ldots,R\right\}$ .* 2. II.

Output $()$ * – return a tuple $\langle\widehat{S},\left|\overline{W}\right|\rangle$ such that $\widehat{S}$ is an estimation of the sum of the last $\left|\overline{W}\right|\approx W$ elements.*

The mean of the window is estimated as $\widehat{m}\triangleq\frac{\widehat{S}}{\left|\overline{W}\right|}$ .

We employ two separate window summing instances (as explained above). The first one simply processes the input and is used to compute the mean. The second one computes the sum of squared values over a sliding window. This is illustrated in Figure 4. We use the following identity for the window standard deviation $\sigma_{\overline{W}}$ :

[TABLE]

This allow us to compute $\sigma_{\overline{W}}$ from the sum of squares and the mean of the window elements. A pseudo code of this method appears in Algorithm 5.

J.1 Accuracy of standard deviation algorithms

We now discuss the accuracy that Algorithm 5 provides, for each specific implementation of the underlying black box $\mathbb{A}$ . First, if $\mathbb{A}$ computes the exact sum over the slack window, then all quantities are computed without error.

Next, consider a multiplicative error $\mathbb{A}$ , with a slack window (Algorithm 4). In this case, the algorithm computes a multiplicative error of the standard deviation (over the window considered by $\mathbb{A}$ ). Specifically, if $\mathbb{A}$ provides a $(1+\epsilon)$ multiplicative approximation of the sum then the standard deviation is estimated within a multiplicative error of $\sqrt{1+\epsilon}=1+\epsilon/2+O(\epsilon^{2})$ .

Finally, consider an additive $RW\epsilon$ error algorithm on a slack window (Algorithm 2). In this case, Algorithm 5 provides an $R\sqrt{\epsilon}$ additive approximation of the window’s standard deviation.

Appendix K Analysis of allowing slack in count distinct algorithms

K.1 Background

Accurately counting distinct elements requires linear space **[22]**. Intuitively, one needs to maintain a list of all previously encountered identifiers. Therefore, accurate measurement does not scale to large streams and approximate solutions are very popular. Specifically, count distinct algorithms often use randomized estimators **[3, 14, 17, 23]**. Randomized algorithms typically use a hash function $H:\mathbb{D}\to\{0,1\}^{\infty}$ that maps ids to infinite bit strings. When a maximal cardinality bound is known, finite strings are used and typically 32 bit integers suffice to reach estimations of over $10^{9}$ **[22]**. We assume that the hashed values are distributed uniformly at random, i.e., $\forall d\in\mathbb{D}:\Pr[H(d)_{i}]=0.5$ .

Count distinct algorithms look for certain observables in the hashes. For example, some algorithms **[3, 28]** look at the minimal observed hash value as a real number in $[0,1]$ and exploit that $\mathbb{E}(\min\left(H(\cal{M})\right))=\frac{1}{n+1}$ , where $n$ is the number of the distinct items in the multi-set $\cal{M}$ . Another possibility is to look for patterns of the form $0^{\beta-1}1$ **[17, 22]**. When such a pattern is first encountered, it is likely that there were at least $2^{\beta}$ unique elements.

The state of the art count distinct algorithm is HyperLogLog (HLL) **[22]**, which is being used by Google **[29]**. HLL requires $m$ bytes and its standard deviation is $\sigma\approx\frac{1.04}{\sqrt{m}}$ and was extended to exact windows by **[11, 24]**. That extension is used to detect port scans in networked systems **[12]**. Exact Window HLL (W-HLL) requires $5m\ln\left({W/m}\right)$ bytes and its standard deviation is $\sigma\approx\frac{1.04}{\sqrt{m}}$ . In this work, we present $\tau$ -Slack HLL ( $\tau-SHLL$ ) that requires $(\tau^{-1}+1)m$ bytes and has a standard deviation of $\sigma\approx\frac{1.04}{\sqrt{m}}$ . When $\tau$ is fixed, Slack HLL requires $O(m)$ words. When $\tau^{-1}=o(\ln\left({W/m}\right))$ it requires asymptotically less space than W-HLL. For completeness, we provide an overview of HLL in Appendix L.

K.2 $\tau$ -Slack HyperLogLog

We now present the $\tau-SHLL$ algorithm. We logically divide the stream into fixed $W\tau$ sized blocks. Our algorithm maintains a cyclic buffer of $\tau^{-1}+1$ HLL instances. Each instance has a buffer index in the range $(0,\tau^{-1})$ and the symbol $HLL[i]$ refers to the HLL instance at index $i$ , and $HLL[i]_{k}$ denotes its $k$ ’th register for $k\in[m-1]$ . We use two counters: Current Block (CB) holds values in $\left\{0,1,\ldots,\tau^{-1}\right\}$ and Place in Block (PB) counts how many items are included in the current block. Initially, CB and PB are set to 0 and new items are always added to $HLL[CB]$ . Each additional item increments $PB$ and once $PB$ reaches the value $W\tau$ , we set $PB=0$ and increment $CB$ . We also reset the HLL instance at $HLL[CB]$ by setting all its registers to $-\infty$ . Doing so enables us to forget information that is guaranteed not to be in the window. To query Slack HLL, we use the maximum of each of the $m$ registers to generate $Z$ and continue as in HLL. A pseudo code of Slack HLL is found in Algorithm 6 and an example of the algorithm’s setup is illustrated in Figure 5. Note that $\alpha_{m}$ is range correction constant that depends on $m$ .

The next theorem, whose proof is deferred to Appendix M, shows that $\tau-SHLL$ is correct.

Theorem K.1.

Let $\langle\widehat{D},PB\rangle$ be the query result of Slack HLL, then $\widehat{D}$ is the query result of $HLL$ for a stream containing the last $W+PB$ events (for streams longer than $W$ ).

Appendix L Hyper LogLog (HLL) Overview

We provide a brief overview of the HLL algorithm **[22]**. HLL uses a hash function that maps each identifier to an infinite string of [math], $1$ * bits; $H:\mathbb{D}\to\{0,1\}^{\infty}$ , where $\mathbb{D}$ is the identifier domain. Given $s\in\{0,1\}^{\infty}$ , the operator $lsb(s)$ returns the position of the leftmost $1$ -bit, e.g., $lsb(0001...)=3$ (counting from 0). Intuitively, a string with $k$ leading zeros is expected to appear after $2^{k}$ events. That is, $HLL$ stores the largest previously encountered $lsb$ value in a counter. The space complexity is $\log\log\left(|\cal{M}|\right)$ , which in practice means that a single byte suffices for most scenarios.*

To augment precision, $m$ different $HLL$ counters are used. For simplicity, we assume that $m=2^{b}$ for some positive $b\in\mathbb{N}$ . Stochastic averaging is performed by using the first $b$ bits of each hashed value to determine the $HLL$ instance to be updated.

To satisfy a query, we read all $m$ estimations, calculate their harmonic average and normalize the result. The technical details of how to best interpret the result can be found in **[22, 29]**. Algorithm 7 provides pseudo code of the HLL algorithm. As can be observed, stochastic averaging is performed in Line 5. The notation $\left(h_{0},h_{1},...h_{b-1}\right)$ describes the bit composition of $h$ . The query algorithms uses the harmonic average of the different experiments $\left(M[0],M[1],...,M[m]\right)$ and the result is then normalized with a constant that depends on $m$ (specifically, $\alpha_{m}\cdot m^{2}$ ).

Appendix M Proof of Theorem K.1

Before we prove Theorem K.1, we need to prove an auxiliary lemma.

Lemma M.1.

The last $\min(|\cal{M}|$$,W+PB)$ updates are performed on all $HLL$ instances.

Proof M.2.

Initially, when $|\cal{M}|<W+W\tau$ , every $HLL$ instance that is initialized is already empty. Thus, since every Slack HLL updates an $HLL$ instance (in Line 3), the number of update operations is $|\cal{M}|$ . When $|\cal{M}|=W+W\tau$ , $HLL[0]$ is initialized at first and we lose the oldest $\tau W$ events. At that point, the number of events is $W$ and $PB=0$ . In each subsequent update, both $PB$ and the number of events are increased by $1$ until $PB=\tau W$ again. At that point, $PB$ is set back to [math] and the number of events drops to the last $W$ .

We are now ready to prove Theorem K.1.

Proof M.3.

Note that $W^{\prime}=W+PB$ and therefore Lemma M.1 guarantees that the last $W^{\prime}$ events are summarized in Slack HLL. Therefore, $M_{max}[i]$ used to generate $Z$ in Slack HLL has the highest $\rho$ value of all the last $W^{\prime}$ elements. Consequently, the value that determined $M_{max}[i]$ in Slack HLL also determines $M[i]$ in an HLL that summarizes the last $W$ elements.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In ACM PODS , 2012.
2[2] Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. Conga: Distributed congestion-aware load balancing for datacenters. ACM SIGCOMM 2014.
3[3] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In RANDOM , 2002.
4[4] Ran Ben Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Efficient Summing over Sliding Windows. In SWAT , 2016.
5[5] Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo Caggiani Luizelli, and Erez Waisbard. Constant time updates in hierarchical heavy hitters. In ACM SIGCOMM , 2017.
6[6] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Heavy hitters in streams and sliding windows. In IEEE INFOCOM , 2016.
7[7] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Optimal elephant flow detection. In IEEE INFOCOM , 2017.
8[8] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Randomized admission policy for efficient top-k and frequency estimation. In IEEE INFOCOM , 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Give Me Some Slack: Efficient Network Measurements

Abstract

1 Introduction

Our Contributions

2 Preliminaries

3 Lower Bounds

3.1 (W,τ)(W,\tau)(W,τ)-Exact Summing

Lemma 3.1**.**

Theorem 3.2**.**

Proof 3.3**.**

3.2 (W,τ,ϵ)(W,\tau,\epsilon)(W,τ,ϵ)-Additive Summing

Theorem 3.4**.**

3.3 (W,τ,ϵ)(W,\tau,\epsilon)(W,τ,ϵ)-Multiplicative Summing

Lemma 3.5**.**

Fact 1**.**

Lemma 3.6**.**

Proof 3.7**.**

Lemma 3.8**.**

Proof 3.9**.**

Lemma 3.10**.**

Proof 3.11**.**

Theorem 3.12**.**

4 Upper Bounds

4.1 (W,τ)(W,\tau)(W,τ)-Exact Summing

Theorem 4.1**.**

Proof 4.2**.**

Theorem 4.3**.**

4.2 (W,τ,ϵ)(W,\tau,\epsilon)(W,τ,ϵ)-Additive Summing

Theorem 4.4**.**

Proof 4.5**.**

Corollary 4.6**.**

Theorem 4.7**.**

4.3 (W,τ,ϵ)(W,\tau,\epsilon)(W,τ,ϵ)-Multiplicative Summing

Improved (W,τ,ϵ)(W,\tau,\epsilon)(W,τ,ϵ)-Multiplicative Summing for τ=Θ(1)\tau=\Theta(1)τ=Θ(1)

Theorem 4.8**.**

4.4 The Mean of a Slack Window

5 Other Measurements over Slack Windows

Theorem 5.1**.**

Theorem 5.2**.**

6 Discussion

Appendix A Proof of Lemma 3.1

Proof A.1**.**

Appendix B Proof of Theorem 3.4

Lemma B.1**.**

Proof B.2**.**

Algorithm 12**.**

Proof B.3**.**

Appendix C Proof of Lemma 3.5

Lemma**.**

Proof C.1**.**

Lemma**.**

Proof C.2**.**

Appendix D Tighter Analysis of the (W,τ)(W,\tau)(W,τ)-Exact Summing Algorithm

Theorem D.1**.**

Proof D.2**.**

Appendix E Correctness Proof of Algorithm 1

Theorem E.1**.**

Proof E.2**.**

Appendix F Correctness Proof of Algorithm 2

Theorem**.**

Proof F.1**.**

Appendix G Analysis of Algorithm 3

Theorem G.1**.**

Proof G.2**.**

Theorem G.3**.**

Lemma G.4**.**

Proof G.5**.**

Theorem G.6**.**

Proof G.7**.**

Appendix H Analysis of Algorithm 4

Observation H.1**.**

Lemma H.2**.**

Proof H.3**.**

Lemma H.4**.**

3.1 $(W,\tau)$ -Exact Summing

Lemma 3.1.

Theorem 3.2.

Proof 3.3.

3.2 $(W,\tau,\epsilon)$ -Additive Summing

Theorem 3.4.

3.3 $(W,\tau,\epsilon)$ -Multiplicative Summing

Lemma 3.5.

Fact 1.

Lemma 3.6.

Proof 3.7.

Lemma 3.8.

Proof 3.9.

Lemma 3.10.

Proof 3.11.

Theorem 3.12.

4.1 $(W,\tau)$ -Exact Summing

Theorem 4.1.

Proof 4.2.

Theorem 4.3.

4.2 $(W,\tau,\epsilon)$ -Additive Summing

Theorem 4.4.

Proof 4.5.

Corollary 4.6.

Theorem 4.7.

4.3 $(W,\tau,\epsilon)$ -Multiplicative Summing

Improved $(W,\tau,\epsilon)$ -Multiplicative Summing for $\tau=\Theta(1)$

Theorem 4.8.

Theorem 5.1.

Theorem 5.2.

Proof A.1.

Lemma B.1.

Proof B.2.

Algorithm 12.

Proof B.3.

Lemma.

Proof C.1.

Lemma.

Proof C.2.

Appendix D Tighter Analysis of the $(W,\tau)$ -Exact Summing Algorithm

Theorem D.1.

Proof D.2.

Theorem E.1.

Proof E.2.

Theorem.

Proof F.1.

Theorem G.1.

Proof G.2.

Theorem G.3.

Lemma G.4.

Proof G.5.

Theorem G.6.

Proof G.7.

Observation H.1.

Lemma H.2.

Proof H.3.

Lemma H.4.

Proof H.5.

Lemma H.6.

Proof H.7.

Lemma H.8.

Proof H.9.

Theorem H.10.

Proof H.11.

Corollary H.12.

Theorem.

Proof I.1.

K.2 $\tau$ -Slack HyperLogLog

Theorem K.1.

Lemma M.1.

Proof M.2.

Proof M.3.