Give Me Some Slack: Efficient Network Measurements
Ran Ben Basat, Gil Einziger, Roy Friedman

TL;DR
This paper explores how allowing a small slack in the sliding window size can lead to more efficient algorithms for network measurement problems, reducing memory requirements and enabling faster computations.
Contribution
It introduces a slack-based model for sliding window problems, demonstrating improved algorithmic efficiency and reduced space complexity for key network measurement tasks.
Findings
Slack enables algorithms for MAX and GENERAL-SUM to use less memory.
For sub-linear approximation problems, slack further reduces asymptotic resource requirements.
The model offers practical benefits for high-speed network measurement implementations.
Abstract
Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing \emph{slack} in the window size on the asymptotic requirements of sliding window problems. That is, the algorithm can dynamically adjust the window size between and where is a small positive parameter. We demonstrate this model's attractiveness by showing that it enables efficient algorithms to problems such as MAX and GENERAL-SUM that require bits even for constant factor approximations in the exact sliding window model. Additionally, for problems that admit sub-linear approximation algorithms such as BASIC-SUMMING and COUNT-DISTINCT, the slack model…
| Exact Sum | Additive Error | Multiplicative Error | ||
|---|---|---|---|---|
| Exact Window | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11affiliationtext: Department of Computer Science, Technion
{sran,roy}@cs.technion.ac.il22affiliationtext: Nokia Bell Labs
\Copyright
Ran Ben-Basat, Gil Einziger, Roy Friedman
Give Me Some Slack: Efficient Network Measurements
Ran Ben Basat
Gil Einziger
Roy Friedman
Abstract
Many networking applications require timely access to recent network measurements, which can be captured using a sliding window model. Maintaining such measurements is a challenging task due to the fast line speed and scarcity of fast memory in routers. In this work, we study the impact of allowing slack in the window size on the asymptotic requirements of sliding window problems. That is, the algorithm can dynamically adjust the window size between and where is a small positive parameter. We demonstrate this model’s attractiveness by showing that it enables efficient algorithms to problems such as Maximum and General-Summing that require bits even for constant factor approximations in the exact sliding window model. Additionally, for problems that admit sub-linear approximation algorithms such as Basic-Summing and Count-Distinct, the slack model enables a further asymptotic improvement.
The main focus of the paper is on the widely studied Basic-Summing problem of computing the sum of the last integers from in a stream. While it is known that bits are needed in the exact window model, we show that approximate windows allow an exponential space reduction for constant .
Specifically, for , we present a space lower bound of bits. Additionally, we show an lower bound for additive approximations and a bits lower bound for multiplicative approximations. Our work is the first to study this problem in the exact and additive approximation settings. For all settings, we provide memory optimal algorithms that operate in worst case constant time. This strictly improves on the work of [16] for -multiplicative approximation that requires space and performs updates in worst case time. Finally, we show asymptotic improvements for the Count-Distinct, General-Summing and Maximum problems.
1 Introduction
Network algorithms in diverse areas such as traffic engineering, load balancing and quality of service [2, 9, 26, 31, 38] rely on timely link measurements. In such applications recent data is often more relevant than older data, motivating the notions of aging and sliding window [6, 11, 18, 32, 34]. For example, a sudden decrease in the average packet size on a link may indicate a SYN attack [33]. Additionally, a load balancer may benefit from knowing the current utilization of a link to avoid congestion [2].
While conceptually simple, conveying the necessary information to network algorithms is a difficult challenge due to current memory technology limitations. Specifically, DRAM memory is abundant but too slow to cope with the line rate while SRAM memory is fast enough but has a limited capacity [10, 15, 36]. Online decisions are therefore realized through space efficient data structures [7, 8, 19, 20, 5, 30, 35, 37] that store measurement statistics in a concise manner. For example, [19, 35] utilize probabilistic counters that only require bits to approximately represent numbers up to . Others conserve space using variable sized counter encoding [20, 30] and monitoring only the frequent elements [6].
Basic-Summing is one of the most basic textbook examples of such approximated sliding window stream processing problems [16]. In this problem, one is required to keep track of the sum of the last elements, when all elements are non-negative integers in the range . The work in [16] provides a -multiplicative approximation of this problem using bits. The amortized time complexity is and the worst case is . In contrast, we previously showed an -additive approximation with bits [4].
Sliding window counters (approximated or accurate) require asymptotically more space than plain stream counters. Such window counters are prohibitively large for networking devices which already optimize the space consumption of plain counters.
This paper explores the concept of slack, or approximated sliding window, bridging this gap. Figure 1 illustrates a “window” in this model. Here, each query may select a -slack window whose size is between (the green elements) and (the green plus yellow elements). The goal is to compute the sum with respect to this chosen window.
Slack windows were also considered in previous works [16, 34] and we call the problem of maintaining the sum over a slack window SS. Datar et al. [16] showed that constant slack reduces the required memory from to . For -slack windows they provide a -multiplicative approximation using bits.
Our Contributions
This paper studies the space and time complexity reductions that can be attained by allowing slack – an error in the window size. Our results demonstrate exponentially smaller and asymptotically faster data structures compared to various problems over exact windows. We start with deriving lower bounds for three variants of the Basic-Summing problem – when computing an exact sum over a slack window, or when combined with an additive and a multiplicative error in the sum. We present algorithms that are based on dividing the stream into -sized blocks. Our algorithms sum the elements within each block and represent each block’s sum in a cyclic array of size . We use multiple compression techniques during different stages to drive down the space complexity. The resulting algorithms are space optimal, substantially simpler than previous work, and reduce update time to .
For exact SS, we present a lower bound of bits. For multiplicative approximations we prove an \Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)} bits bound when . We show that bits are required for additive approximations.
Next, we introduce algorithms for the SS problem, which asymptotically reduce the required memory compared to the sliding window model. For the exact and additive error versions of the problem, we provide memory optimal algorithms. In the multiplicative error setting, we provide an O\big{(}\tau^{-1}\left({\log{\epsilon^{-1}}+\log\log\left({RW\tau}\right)}\right)+\log(RW)\big{)} space algorithm. This is asymptotically optimal when and . It also asymptotically improves [16] when . We further provide an asymptotically optimal solution for constant , even when . All our algorithms are deterministic and operate in worst case constant time. In contrast, the algorithm of [16] works in worst case time.
To exemplify our results, consider monitoring the average bandwidth (in bytes per second) passed through a router in a hours window, i.e., seconds. Assuming we use a 100GbE fiber transceiver, our stream values are bounded by bytes. If we are willing to withstand an error of (i.e., about ), the work of [4] provides an additive approximation over the sliding window and requires about 120KB. In contrast, using a 10 minutes slack (), our algorithm for exact SS requires only 800 bytes, 99% less than approximate summing over exact sliding window. For the same slack size, the algorithm of [16] requires more space than our exact algorithm even for a large 3% error. Further, if we also allow the same additive error (), we provide an algorithm that requires only 240 bytes - a reduction of more than !
Table 1 compares our results for the important case of constant slack with [16]. As depicted, our exact algorithm is faster and more space efficient than the multiplicative approximation of [16]. Comparing our multiplicative approximation algorithm to that of [16], we present exponential space reductions in the dependencies on and , with an asymptotic reduction in as well. We also improve the update time from to .
Finally, we apply the slack window approach to multiple streaming problems, including Maximum, General-Summing, Count-Distinct and Standard-Deviation. We show that, while some of these problems cannot be approximated on an exact window in sub-linear space (e.g. maximum and general sum), we can easily do so for slack windows. In the count distinct problem, a constant slack yields an asymptotic space reduction over [11, 24].
2 Preliminaries
For , we denote . We consider a stream of data elements , where at each step a new element is added to . A -sized window contains only the last elements: . We say that is a -slack -sized window if there exists such that . For simplicity, we assume that and are integers. Unless explicitly specified, the base of all logs is .
Algorithms for the SS problem are required to support two operations:
Update Process a new element . 2. 2.
Output Return a pair such that is the slack size and is an estimation of the last elements sum, i.e., .
We consider three types of algorithms for SS:
Exact algorithms: an algorithm solves -Exact Summing if its Output returns that satisfies and . 2. 2.
Additive algorithms: we say that solves -Additive Summing if its Output function returns that satisfies and . 3. 3.
Multiplicative algorithms: solves -Multiplicative Summing if its Output returns satisfying and if , and otherwise.
3 Lower Bounds
In this section, we analyze the space required for solving the SS problems. Intuitively, our bounds are derived by constructing a set of inputs that any algorithm must distinguish to meet the required guarantees. There are two tricks that we frequently use in these lower bounds. The first is setting the input such that the slack consists only of zeros, and thus the algorithm must return the desired approximation of the remaining window. The next is using a “cycle argument” – consider two inputs and for . If both lead to the same memory configuration, so do such for any . Thus, if there is a such that no single answer approximates and well, then and had to lead to separate memory configurations in the first place.
3.1 -Exact Summing
We start by proving lower bounds on the memory required for exact SS.
Lemma 3.1**.**
Any deterministic algorithm
We now use Lemma 3.1, whose proof is deferred to Appendix A, to show the following lower bound on -Exact Summing algorithms:
Theorem 3.2**.**
Any deterministic algorithm
Proof 3.3**.**
*Lemma 3.1 shows a bound. We proceed with showing a lower bound
bits. Consider the following languages:*
[TABLE]
Notice that since each of the words in has a distinct sum of literals, and each number in is the sum of a word. We show that each input in must be mapped into a distinct memory configuration. Let , be two distinct inputs in such that . Denote – the last place in which differs from ; also, denote . Consider the sequences and . Notice that the last elements windows for are and respectively, and that the preceding elements of both are all zeros. An illustration of the setting appears in Figure 2.
By our choice of , we have that the sum of the last elements of and is different, and since the slack is all zeros, no answer is correct on both. Finally, note that this implies that had to reach different configurations, as otherwise
3.2 -Additive Summing
Next, Theorem 3.4 shows a lower bound for additive approximations of SS. Due to lack of space, the proof is deferred to Appendix B.
Theorem 3.4**.**
For , any deterministic algorithm
3.3 -Multiplicative Summing
In this section, we show lower bounds for multiplicative approximations of SS. We start with Lemma 3.5, whose proof appears in Appendix C.
Lemma 3.5**.**
For , any deterministic algorithm
To extend our multiplicative lower bound, we use the following fact:
Fact 1**.**
For any , the sequence , defined as
can be represented using a closed form as .
Next, let and , such that , , ; consider the integer sequence
[TABLE]
Using the fact above, we show the following lemma:
Lemma 3.6**.**
For every integer we have .
Proof 3.7**.**
To apply Fact 1, we define an upper bounding sequence as follows:
[TABLE]
Thus, we can rewrite the ’th element of the sequence as:
[TABLE]
We can now use this representation to derive an upper bound of :
[TABLE]
Finally, since for any , we conclude that .
We now define the integer set as , and proceed to bound .
Lemma 3.8**.**
For any we have .
Proof 3.9**.**
Clearly, the cardinality of is the largest for which . According to Lemma 3.6, we have that , and thus:
[TABLE]
We proceed with a stronger lower bound for non-constant values.
Lemma 3.10**.**
For , any deterministic algorithm
Proof 3.11**.**
We use to denote a sequence in that has a sum of . For an integer set , we denote . We now choose the value of to be ; notice that as required. Next, consider:
[TABLE]
That is, every word in the language consists of a concatenation of words , such that every starts with zeros followed by a string representing an integer in , which is defined above. According to Lemma 3.8 we have that
[TABLE]
Next, we show that every two words in must reach different memory configurations, thereby implying a bits lower bound. Let such that , , and . We next assume by contradiction that and leads
Finally, we combine Lemma 3.5 and Lemma 3.10 to obtain the following lower bound:
Theorem 3.12**.**
For , any deterministic algorithm for the -Multiplicative Summing problem requires at least \Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)} bits.
4 Upper Bounds
In this section, we introduce solutions for the SS problems. In general, all our algorithms have a structure that consists of a subset of the following, where “compression” has a different meaning for the exact, additive and multiplicative variants:
- •
Compress the arriving item.
- •
Add the item into a counter and compress the counter.
- •
If a -sized block ends, store it as a compressed representation of . Sometimes we propagate the compression error to the following block; otherwise, we zero .
- •
Use the block values and to construct an estimation for the sum.
Our double rounding technique, described below, asymptotically improves over running separate plain stream (insertion only) algorithm instances.
4.1 -Exact Summing
We divide the stream into -sized blocks and sum the number of arriving elements in each block with a bits counter. We maintain the sum of the current block in a variable called , maintains the number of elements within the current block, and is the current block number. The variable is a cyclic buffer of blocks. Every steps, we assign the value of to the oldest block () and increment . Intuitively, we “forget” when its block is no longer part of the window. To satisfy queries in constant time, we also maintain the sum of all active counters in a -bits variable named . Algorithm 1 provides pseudocode for the described algorithm.
We now analyze the memory consumption of Algorithm 1.
Theorem 4.1**.**
Algorithm 1 uses bits.
Proof 4.2**.**
* takes bits; requires ; adds bits, while needs bits. Finally, is a -sized array of counters, each allocated with bits. Overall, it uses bits.*
We conclude that Algorithm 1 is asymptotically optimal.
Theorem 4.3**.**
Let be the -Exact Summing lower bound of Theorem 3.2. Algorithm 1 uses at most memory bits.
Theorem 4.3 shows that Algorithm 1 is only x larger than the lower bound. In Appendix D we show that in some cases we can get considerably closer to the lower bound. Finally, in Appendix E we show that Algorithm 1 is correct.
4.2 -Additive Summing
We now show that additional memory savings can be obtained by combining slackness with an additive error. First, we consider the case where . In [4], we proposed an algorithm that sums over (exact) elements window using the optimal bits, with an additive error of . Next, notice that if an algorithm solves -Additive Summing, it also solves -Additive Summing; hence, we can apply Theorem 3.4 to conclude that it requires . Thus, we can run the algorithm from [4] and remain asymptotically memory optimal with no slack at all!
Henceforth, we assume that ; we present an algorithm for the problem using a -stage rounding technique. When a new item arrives, we scale it by and then round the results to bits. As in Section 4.1, we break the stream into non-overlapping blocks of size and compute the sum of each block separately. However, we now sum the rounded values rather than the exact input, with a -bits counter denoted . Once the block is completed, we round its sum such that it is represented with bits. Note that this second rounding is done for the entire block’s sum while we still have the “exact” sum of rounded fractions. Thus, we propagate the second rounding error to the following block. An illustration of our algorithm appears in Figure 3. Here, refers to rounding a fractional number into the closest number such that . Algorithm 2 provides pseudo code for the algorithm, which uses the following variables:
- a fixed point variable that uses bits to store its integral part and additional bits for storing the fractional part. 2. 2.
- a cyclic array that contains elements, each of which takes bits. 3. 3.
- keeps the sum of elements in and is represented using bits. 4. 4.
- the index variable used for tracking the oldest block in . 5. 5.
- a variable that keeps the offset within the sized block.
We now analyze the memory consumption of Algorithm 2.
Theorem 4.4**.**
Algorithm 2 uses bits.
Proof 4.5**.**
* requires bits; requires another ; takes additional bits; adds bits, while and is represented with bits. Overall, the space requirement is bits.*
Corollary 4.6**.**
Let be the -Additive Summing space lower bound of Theorem 3.4, then Algorithm 2 uses bits.
Finally, Theorem 4.7 shows that Algorithm 2 is correct. The proof is deferred to Appendix F
Theorem 4.7**.**
Algorithm 2 solves the -Additive Summing problem.
4.3 -Multiplicative Summing
In this section, we present Algorithm 3 that provides a multiplicative approximation of the SS problem. Compared to Algorithm 1, we achieve a space reduction by representing each sum of elements using bits. Specifically, when a block ends, if its sum was , we store (we allow a value of for if ). To achieve Output, we also store an approximate window sum , which is now a fixed point fractional variable with bits for its integral part and additional bits for storing a fraction. To update ’s value for a new , we round down the value of . Specifically, for a real number , we denote , for . Our pseudo code appears in Algorithm 3. The algorithm requires O\big{(}\tau^{-1}\left({\log\log\left({RW\tau}\right)+\log{\epsilon^{-1}}}\right)\allowbreak+\log RW\big{)} bits of space and is memory optimal when and . The full analysis of Algorithm 3 is deferred to Appendix G.
Next, we present an alternative -Multiplicative Summing algorithm that achieves optimal space consumption for , regardless of the value of .
Improved -Multiplicative Summing for
Algorithm 4 is more space efficient than Algorithm 3 but has a query time of . For , Algorithm 4 is memory optimal and supports constant time queries even if ; for this case, Algorithm 3 requires bits which is sub optimal.
Intuitively, we shave the bits from the space requirement of Algorithm 3 using an approximate representation for our variable and by not keeping the variable that allowed time queries regardless of the value of . To avoid using bits in , we use a fixed point representation in which bits are allocated for its integral part and another for the fractional part. The goal of is still to approximate the sum of the elements within a block, but now we aim for the sum to be approximately . Whenever a block ends, we store only the integral part of in our cyclic array to save space. When queried, we compute an estimate for the sum using all of the values in , which makes our query procedure take time. To use the fixed point structure of , we use the operator that rounds a real number into . We denote and . In appendix H we prove the following theorem.
Theorem 4.8**.**
For , Algorithm 4 processes elements and answers queries in time, uses bits, and is asymptotically optimal.
4.4 The Mean of a Slack Window
For some applications there is value in knowing the mean of a slack window. For example, a load balancer may be interested in the average transmission throughput. In exact windows, the sum and the mean can be derived from each other as the window size is constant. In slack windows, the window size changes but our algorithms also return the current slack offset . That is, by dividing by we get an estimation of the mean (we assume that stream size is larger than W). Specifically, Algorithm 1 provides the exact mean; Algorithm 2 approximates it with additive error, while Algorithm 3 yields a multiplicative approximation.
5 Other Measurements over Slack Windows
We now explore the benefits of the slack model for other problems.
Maximum**. ** While maintaining the maximum of a sliding window can be useful for applications such as anomaly detection [33, 26], tracking it over an exact window is often infeasible. Specifically, any algorithms for a maximum over an (exact) window must use bits [16]. The following theorem, proved in Appendix I shows that we can get a much more efficient algorithm for slack windows. Observe the the following bounds match for values that are not too small ().
Theorem 5.1**.**
Tracking the maximum over a slack window deterministically requires and bits.
Standard-Deviation**. ** Building on the ability of our summing algorithms to provide the size of the slack window that they approximate, we can compute standard deviations over slack windows. Intuitively, the standard deviation of the window can be expressed as
[TABLE]
there is the slack window and is its mean. We can then use two slack summing instances to track and . This gives us an algorithm that computes the exact standard deviation over slack windows using space. Similarly, by using approximate rather than exact summing solutions we can compute a multiplicative approximation for the standard deviation using O\big{(}\tau^{-1}\big{(}\log\epsilon^{-1}+\log\log\left({RW\tau}\right)\big{)}+\log W\big{)} bits, or an -additive approximation using space. We expand on this further in Appendix J.
General-Summing**. ** General-Summing is similar to Basic-Summing, except that the integers can be in the range . That is, we now allow for negative elements as well. Datar et al. [16] proved that General Sum requires bits, even for and constant factor approximation. In contrast, our exact summing algorithm from section 4.1 trivially generalizes to General-Summing and allows exact solution over slack windows.
Count-Distinct**. ** Estimating the number of distinct elements in a stream is another useful metric. In networking, the packet header is used to identify different flows, and it is useful to know how many distinct of them are currently active. A sudden spike in the number of active flows is often an indication of a threat to the network. It may indicate the propagation of a worm or virus, port scans that are used to detect vulnerabilities in the system and even Distributed Denial of Service (DDoS) attacks [13, 21, 25].
Here, we have studied the memory reduction that can be obtained by following a similar flow to our summing algorithms – we break the stream into sized blocks and run the state of the art approximation algorithm on each block separately. Luckily, count distinct algorithms are mergable [1]. That is, we can merge the summaries for each block to obtain an estimation of the number of distinct items in the union of the blocks. In Appendix K we show that this approach yields an algorithm with superior space and query time compared to the state of the art algorithms for counting distinct elements over sliding windows [11, 24]. Formally, we prove the following theorem.
Theorem 5.2**.**
For and any fixed , there exists an algorithm that uses space, performs updates in constant time and answers queries in time , such that the result approximates a window whose size is in ; the resulting estimation is asymptotically unbiased and has a standard deviation of . State of the art approaches for exact windows [11, 24] require space and time per query for a similar standard deviation.
6 Discussion
In this work we have explored the slack window model for multiple streaming problems. We have shown that it enables asymptotic space and time improvements. Particularly, introducing slack enables logarithmic space exact algorithms for certain problems such as Maximum and General-Summing. In contract, these problems do not admit sub-linear space approximations in the exact window model. Even in problems that do have sub-linear space approximations such as Standard-Deviation and Count-Distinct, adding slack asymptotically improves the space requirement and allows for constant time updates.
Much of our work has focused on the classic Basic-Summing problem. Based on our findings, we argue that allowing a slack in the window size is an attractive approximation axis as it enables greater space reductions compared to an error in the sum. As an example, for a fixed value, computing a -multiplicative approximation requires space [16]. Conversely, a multiplicative error in the window size, for a constant , allows summing using bits – same as in summing elements without sliding windows! Given that for exact windows randomized algorithms have the same asymptotic complexity as deterministic ones [4, 16], we expect randomization to have limited benefits for slack windows as well.
Appendix A Proof of Lemma 3.1
Proof A.1**.**
Consider the following language
[TABLE]
That is, contains a word with consecutive zeros and the rest of the words in are composed of these components in this order:
- •
* zeros for some .*
- •
a non zero symbol .
- •
* repetitions of the maximal symbol ().*
- •
* zeros for some .*
Our lower bound stems from the observation that every word in must lead to a different state. The language size is: Therefore, the number of required bits is at least: . Further, this number is an integer and therefore at least bits are required.
First, notice that the word composed of zeros requires a unique configuration as must return [math] after processing that word. In contrast, it must not return [math] after processing any other word as there is at least a single within the last elements.
Let be two different words that are not all-zeros. We need to show that and require different memory configuration.
By definition of , and . Observe that the last elements of are and respectively and that both are preceded with at least zeros. If or , then and thus cannot return the same count for both, regardless of the slack, as it is all zeros ib both and .
Next, assume that , and that without loss of generality . This means that both and have the same count.
Since , is a strict prefix of , i.e., . Assume by contradiction that after processing
Appendix B Proof of Theorem 3.4
Before we prove Thorem 3.4, we start with a simpler lower bound.
Lemma B.1**.**
Let . Any deterministic algorithm that solves the -Additive Summing problem must use at least bits.
Proof B.2**.**
Denote by a sequence in whose sum is . Next, consider the following languages:
[TABLE]
First, notice that and that all words in have length of at most . This means that .
We now show that every word in must have a dedicated memory configuration, thereby implying a bits bound. Let and be two distinct words in such that and . If , then their most recent elements differ by more than and there is no output that is correct for both. Note that the slack of both and is all zeros. Hence, and require different memory configurations.
Assume that and that by contradiction both and reached the same memory configuration. Since and , then and without loss of generality . This implies that is a prefix of so that . Thus, enters the shared configuration after reading and revisits it after reading . is a deterministic algorithm and therefore it reaches the same configuration also for the following word: . In that word, the last elements are all zeros while the sum of the last elements in is at least . Hence, there is no return value that is correct for both and .
We are now ready to prove Theorem 3.4. The theorem says that for , any deterministic algorithm
Algorithm 12**.**
*that solves the -Additive Summing problem requires
bits.*
Proof B.3**.**
Lemma B.1 shows that
Appendix C Proof of Lemma 3.5
Before we prove Lemma 3.5 we first give a bound on an integer set that will serve us in the main lemma.
Lemma**.**
Consider the integer set , where the integers are taken from the following sequence: . The cardinality of satisfies .
Proof C.1**.**
We first show an upper bound on .
- •
Basis*: for , we have .*
- •
Hypothesis:* .*
- •
Step:* For , we bound as follows:*
[TABLE]
Next, notice that this implies that . Finally, we get a lower bound of , where the last inequality follows from the Taylor expansion of .
We are now ready to prove Lemma 3.5 using the lemma above.
Lemma**.**
For , any deterministic algorithm
Proof C.2**.**
We show a language for which every two words must reach a unique memory configuration, thus implying a bits lower bound. We denote by a sequence in that has a sum of . We define as follows:
[TABLE]
Notice that and according to Lemma Lemma we have
[TABLE]
Consider two words and in , such that . Notice that every two distinct numbers satisfy . Since are preceded with a sequence of zeros, no answer correctly satisfies the requirements for both of them. Thus, if ,
Appendix D Tighter Analysis of the -Exact Summing Algorithm
Theorem D.1**.**
Consider a stream where and . There exists a -Exact Summing algorithm that uses bits, where is the lower bound.
Proof D.2**.**
Our method here is similar to Algorithm 1, but the constant value allows us to compute in without tracking it in . Thus, our algorithm only requires bits, while Theorem 3.2 gives a lower bound of .
Appendix E Correctness Proof of Algorithm 1
Theorem E.1**.**
Algorithm 1 solves the -Exact Summing problem.
Proof E.2**.**
First, notice that is always in the range and thus the slack size is as needed. Next, assume that the algorithm input was ; since stream is broken into blocks of size , where is the offset within the current block, we have that . Further, the last elements are and the preceding elements are . Finally, the algorithm always keep the sum of the elements before the current block in , and the sum of the current block in ; thus, by returning we get exactly the sum of the last elements.
Appendix F Correctness Proof of Algorithm 2
Theorem**.**
Algorithm 2 solves the -Additive Summing problem.
Proof F.1**.**
First, observe that at all times as needed. Denote the stream by , such that represents the number of elements within the current block. Our goal is to show that Algorithm 2 provides a approximation to the sum of the last elements (). That is, the quantity we approximate is
*For any , we use to denote the value of the variable after the ’th item was added. Note that within a block, simply sums the rounded scaled inputs; whenever a block ends, we reduce the value of by , but make up for it by setting . Further, when processing , we replace all of the values of that were determined before the last elements, and none of the set value leaves the window by time . That is, reaches every value in exactly once throughout the last updates. This gives us the following equality:
Thus, we can express the algorithm’s estimate of the sum value as:*
[TABLE]
Next, notice that since , we have and thus:
[TABLE]
Also, since we assumed that was the last of a -sized block, we know that the value of is bounded, and specifically:
[TABLE]
Plugging (2) and (3) into (1), we get a bound on the error:
[TABLE]
*Thus, since , , we get the desired error bound and conclude that Algorithm 2 solves -Additive Summing. *
Appendix G Analysis of Algorithm 3
We now analyze the memory requirement of our algorithm.
Theorem G.1**.**
Algorithm 3 requires bits.
Proof G.2**.**
*Since , it can be represented using
bits. Next, notice that this satisfies:*
[TABLE]
Each counter in now is assigned a value and thus the overall space consumption of is bits. Our variable sums the values rather than the exact sum of each block and is a fixed point variable. We use bits for its integral part and another for its fractional part. Thus, the total number of bits required by the algorithm is
[TABLE]
We thus get that Algorithm 3 is optimal under some conditions; notice that this includes constant values.
Theorem G.3**.**
Let \mathcal{B}=\Omega\big{(}\log(W/\epsilon)\allowbreak+\tau^{-1}\left({\log\left({\tau/\epsilon}\right)+\log\log\left({RW}\right)}\right)\big{)} be the -Multiplicative Summing lower bound showed in Theorem 3.12. Then for and , Algorithm 3 uses bits.
Next, we prove that Algorithm 3 solves the problem. Recall that for a real number , we denote , for .
Lemma G.4**.**
Let ; for every such that the following inequality holds:
[TABLE]
Proof G.5**.**
Observe that we have
[TABLE]
Finally, since and , we get that .
Theorem G.6**.**
Algorithm 3 solves the -Multiplicative Summing problem.
Proof G.7**.**
First, observe that at all times as needed. Denote the stream by , such that represents the number of elements within the current block. We will show that Algorithm 3 provides a approximation to the sum of the last elements (). That is, the quantity we approximate is
[TABLE]
Next, since we reset in every block (Line 9), we have that at the time of the query, Further, since every block is summed individually and then rounded in Line 6. We assume without loss of generality that at the time of the query, and we denote – the sum of the elements in block . Therefore we have:
[TABLE]
That is each of the blocks that precede the last elements is summed exactly (Line 3) and then we store its -based log in . Next, the variable stores the approximated values rounded down (Line 7), i.e.,
[TABLE]
Now, Lemma G.4 implies that
[TABLE]
We then get
[TABLE]
Also, note that if some , then we defined as and as [math]. This means that if , then and thus . Hereafter, assume that . Next, we plug equations (4),(5),(6) into (7) to get:
[TABLE]
Similarly, we bound from below as follows:
[TABLE]
*We showed that in all cases where we have if , thereby proving the theorem. *
Appendix H Analysis of Algorithm 4
In order to analyze Algorithm 4, we first note that by rounding down a real number to use bits, as in the operator, we introduce a rounding error of at most .
Observation H.1**.**
For any .
Our approach in the analysis of Algorithm 4 is as follows:
We start with Lemma H.2 that shows that is a multiplicative approximation to the block’s sum. 2. 2.
Next, Lemma H.4 shows that we do not lose much by taking into our cyclic buffer (rather than itself). This allows us to reduce the memory requirement at the expense of slightly increasing the error. Specifically, we show that is a multiplicative approximation of the sum. 3. 3.
Then we proceed with Lemma H.6 that shows a bound on . This allows us to bound the number of bits needed for the representation of its integral part. 4. 4.
Lemma H.8 analyzes the overall space requirement of Algorithm 4. 5. 5.
Next, Theorem H.10 shows we indeed solve -Multiplicative Summing. 6. 6.
Finally, Corollary H.12 concludes the optimality for constant .
Lemma H.2**.**
Let be the elements of a block summed in , then and otherwise:
[TABLE]
Proof H.3**.**
We prove the lemma by showing that , after summing we have that if then and otherwise:
[TABLE]
The proof is done by induction where we denote by the value of after summing .
- •
Basis:* .*
Here we simply have and the claim holds.
- •
Induction hypothesis:* let then if then and otherwise:*
[TABLE]
- •
Induction step:* let .*
We first consider the case where . In this case we have according to the induction hypothesis. Thus, we get
[TABLE]
Next, consider the case where but . This also gives us and thus:
[TABLE]
and
[TABLE]
Finally, consider the case where , and according to the induction hypothesis:
[TABLE]
Thus, we bound as follows:
[TABLE]
Similarly, we can bound it from below:
[TABLE]
Lemma H.4**.**
Let be the elements of a block summed in , then and otherwise
[TABLE]
Proof H.5**.**
First, notice that Lemma H.2 implies that if then and thus . If the sum is non-zero, the lemma implies:
[TABLE]
Thus, we have that:
[TABLE]
From below, we can bound it as follows:
[TABLE]
where the last inequality holds as .
We now bound the value of in order to compute the memory requirements of Algorithm 4
Lemma H.6**.**
For any , let be elements of a block summed in (Line 3), then .
Proof H.7**.**
According to Lemma H.2 we have that . Thus we can get an upper bound on ’s value:
[TABLE]
We are now ready to compute the space requirement of Algorithm 4.
Lemma H.8**.**
Algorithm 4 uses space.
Proof H.9**.**
The algorithm uses four variables:
- •
* – a fixed point variable with bits for its fractional part. The integral part of , according to Lemma H.6, can be represented using .*
- •
* – a cyclic array with entries, each of which can be represented using bits per Line 6.*
- •
* – tracks the index of the current block and thus has possible values and can be represented using bits.*
- •
* – the offset within the current block; it is represented using bits.*
Overall, we get that the memory requirement is as stated.
We now prove the correctness of Algorithm 4.
Theorem H.10**.**
Algorithm 4 processes elements in constant time, answer queries in , uses space and solves the -Multiplicative Summing problem.
Proof H.11**.**
Let be the elements we are trying to approximate. Observe that , the elements are summed together (Line 3) and then stored in some (Line 6). This means that, according to Lemma H.4, approximates their sum up to a multiplicative error of . Thus, we have that:
[TABLE]
Finally, according to Lemma H.2, we have that approximates the last elements and specifically:
[TABLE]
This allow us to conclude that:
Corollary H.12**.**
For , Algorithm 4 answer queries in time, uses bits, and is asymptotically optimal.
Appendix I Proof of Theorem 5.1
Theorem**.**
Tracking the maximum over a slack window deterministically requires and bits.
Proof I.1**.**
The algorithm we propose is quite simple – compute the maximum over each -sized block and keep a cyclic buffer the last blocks’ maxima. Then, we can compute the maximum of the cyclic buffer at query time to get the maximal value in the slack window. For a lower bound, consider the following language:
[TABLE]
We now claim that each pair of distinct words must lead the algorithm into a distinct memory configuration. Denote , and let . Without loss of generality assume that . If and lead the algorithm into the same memory configuration, then it (being deterministic) must reach the same configuration again and provide the same output for and . But as the maximum in must be (regardless of the chosen slack), and , no single answer is correct for both. Thus, the algorithm must reach a distinct memory configuration for each input, implying a lower bound of bits.
Appendix J Standard deviation over sliding windows
Here, we present our for the standard deviation over a window. While algorithms for the sum of a sliding window are known, to the best of our knowledge, no previous solution computes the standard deviation over an (approximate) sliding window.
We denote by the set of items included in the window. The algorithm uses a window summing algorithm as a black box. Here, can be Algorithm 1, Algorithm 2, or Algorithm 3. We assume that supports two operations:
- I.
Update* – process a new element .* 2. II.
Output* – return a tuple such that is an estimation of the sum of the last elements.*
The mean of the window is estimated as .
We employ two separate window summing instances (as explained above). The first one simply processes the input and is used to compute the mean. The second one computes the sum of squared values over a sliding window. This is illustrated in Figure 4. We use the following identity for the window standard deviation :
[TABLE]
This allow us to compute from the sum of squares and the mean of the window elements. A pseudo code of this method appears in Algorithm 5.
J.1 Accuracy of standard deviation algorithms
We now discuss the accuracy that Algorithm 5 provides, for each specific implementation of the underlying black box . First, if computes the exact sum over the slack window, then all quantities are computed without error.
Next, consider a multiplicative error , with a slack window (Algorithm 4). In this case, the algorithm computes a multiplicative error of the standard deviation (over the window considered by ). Specifically, if provides a multiplicative approximation of the sum then the standard deviation is estimated within a multiplicative error of .
Finally, consider an additive error algorithm on a slack window (Algorithm 2). In this case, Algorithm 5 provides an additive approximation of the window’s standard deviation.
Appendix K Analysis of allowing slack in count distinct algorithms
K.1 Background
Accurately counting distinct elements requires linear space **[22]**. Intuitively, one needs to maintain a list of all previously encountered identifiers. Therefore, accurate measurement does not scale to large streams and approximate solutions are very popular. Specifically, count distinct algorithms often use randomized estimators **[3, 14, 17, 23]**. Randomized algorithms typically use a hash function that maps ids to infinite bit strings. When a maximal cardinality bound is known, finite strings are used and typically 32 bit integers suffice to reach estimations of over **[22]**. We assume that the hashed values are distributed uniformly at random, i.e., .
Count distinct algorithms look for certain observables in the hashes. For example, some algorithms **[3, 28]** look at the minimal observed hash value as a real number in and exploit that , where is the number of the distinct items in the multi-set . Another possibility is to look for patterns of the form **[17, 22]**. When such a pattern is first encountered, it is likely that there were at least unique elements.
The state of the art count distinct algorithm is HyperLogLog (HLL) **[22]**, which is being used by Google **[29]**. HLL requires bytes and its standard deviation is and was extended to exact windows by **[11, 24]**. That extension is used to detect port scans in networked systems **[12]**. Exact Window HLL (W-HLL) requires bytes and its standard deviation is . In this work, we present -Slack HLL () that requires bytes and has a standard deviation of . When is fixed, Slack HLL requires words. When it requires asymptotically less space than W-HLL. For completeness, we provide an overview of HLL in Appendix L.
K.2 -Slack HyperLogLog
We now present the algorithm. We logically divide the stream into fixed sized blocks. Our algorithm maintains a cyclic buffer of HLL instances. Each instance has a buffer index in the range and the symbol refers to the HLL instance at index , and denotes its ’th register for . We use two counters: Current Block (CB) holds values in and Place in Block (PB) counts how many items are included in the current block. Initially, CB and PB are set to 0 and new items are always added to . Each additional item increments and once reaches the value , we set and increment . We also reset the HLL instance at by setting all its registers to . Doing so enables us to forget information that is guaranteed not to be in the window. To query Slack HLL, we use the maximum of each of the registers to generate and continue as in HLL. A pseudo code of Slack HLL is found in Algorithm 6 and an example of the algorithm’s setup is illustrated in Figure 5. Note that is range correction constant that depends on .
The next theorem, whose proof is deferred to Appendix M, shows that is correct.
Theorem K.1**.**
Let be the query result of Slack HLL, then is the query result of for a stream containing the last events (for streams longer than ).
Appendix L Hyper LogLog (HLL) Overview
We provide a brief overview of the HLL algorithm **[22]**. HLL uses a hash function that maps each identifier to an infinite string of [math],* bits; , where is the identifier domain. Given , the operator returns the position of the leftmost -bit, e.g., (counting from 0). Intuitively, a string with leading zeros is expected to appear after events. That is, stores the largest previously encountered value in a counter. The space complexity is , which in practice means that a single byte suffices for most scenarios.*
To augment precision, different counters are used. For simplicity, we assume that for some positive . Stochastic averaging is performed by using the first bits of each hashed value to determine the instance to be updated.
To satisfy a query, we read all estimations, calculate their harmonic average and normalize the result. The technical details of how to best interpret the result can be found in **[22, 29]**. Algorithm 7 provides pseudo code of the HLL algorithm. As can be observed, stochastic averaging is performed in Line 5. The notation describes the bit composition of . The query algorithms uses the harmonic average of the different experiments and the result is then normalized with a constant that depends on (specifically, ).
Appendix M Proof of Theorem K.1
Before we prove Theorem K.1, we need to prove an auxiliary lemma.
Lemma M.1**.**
The last \min(|\cal{M}|$$,W+PB) updates are performed on all instances.
Proof M.2**.**
Initially, when , every instance that is initialized is already empty. Thus, since every Slack HLL updates an instance (in Line 3), the number of update operations is . When , is initialized at first and we lose the oldest events. At that point, the number of events is and . In each subsequent update, both and the number of events are increased by until again. At that point, is set back to [math] and the number of events drops to the last .
We are now ready to prove Theorem K.1.
Proof M.3**.**
Note that and therefore Lemma M.1 guarantees that the last events are summarized in Slack HLL. Therefore, used to generate in Slack HLL has the highest value of all the last elements. Consequently, the value that determined in Slack HLL also determines in an HLL that summarizes the last elements.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. In ACM PODS , 2012.
- 2[2] Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. Conga: Distributed congestion-aware load balancing for datacenters. ACM SIGCOMM 2014.
- 3[3] Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In RANDOM , 2002.
- 4[4] Ran Ben Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Efficient Summing over Sliding Windows. In SWAT , 2016.
- 5[5] Ran Ben Basat, Gil Einziger, Roy Friedman, Marcelo Caggiani Luizelli, and Erez Waisbard. Constant time updates in hierarchical heavy hitters. In ACM SIGCOMM , 2017.
- 6[6] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Heavy hitters in streams and sliding windows. In IEEE INFOCOM , 2016.
- 7[7] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Optimal elephant flow detection. In IEEE INFOCOM , 2017.
- 8[8] Ran Ben-Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. Randomized admission policy for efficient top-k and frequency estimation. In IEEE INFOCOM , 2017.
