Heavy Hitters and Bernoulli Convolutions
Alexander Kushkuley

TL;DR
The paper introduces a simple, event-sensitive frequency approximation algorithm that models event distributions as biased Bernoulli convolutions, enabling analysis of their moments and self-similarity properties.
Contribution
It presents a novel event frequency algorithm that links to biased Bernoulli convolutions, providing new insights into their moments and self-similarity.
Findings
Algorithm effectively models event distributions as Bernoulli convolutions.
Estimation of moments for biased Bernoulli convolutions is demonstrated.
Self-similarity properties are identified under certain conditions.
Abstract
A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard -dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · Fractal and DNA sequence analysis · Algorithms and Data Compression
Heavy Hitters and Bernoulli Convolutions
Alexander Kushkuley
(Salesforce/Demandware, [email protected])
Abstract
A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard -dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions.
1 Introduction
To quote [2], ”there is a need to estimate the count of a given item (or event or combination thereof) during some period of time …Typically, items with highest counts, commonly known as heavy hitters, are of most interest”.
This note is an attempt to redefine event counting problem (cf. [1], [2], [3]). In many cases, the most important factor is recent event ”popularity rank” (cf. e.g. [3]) and not its long-run frequency. Hence, instead of item-event counters consider a time-dependent discrete probability distribution as an estimate for relative frequencies (ranks) of the items involved. An occurrence of an event with index can be represented by a delta function distribution on the set triggering an update of estimated probability distribution by an application of a convex mixture rule . In other words, arrival of an event reduces ranks of all other events while tilting estimated event rank-distribution towards event-item in a simplest way possible. Thus we arrive at the following heavy hitters approximation algorithm
Algorithm 1
Fix a number that is close to . If an item , was clicked (event number did occur) set and set
One practical problem with the above is that all frequencies (probabilities) are updated simultaneously. There are, however, some advantages:
- (1)
decreasing gives higher priority to recent events and vice-versa, increasing will bias the ranking towards ”idling” event items
- (2)
and therefore, sensitivity of this ranking scheme to new events can be easily controlled (even at runtime) by adjusting just one parameter
Remark 1
Suppose that it is desirable that an item should loose half of its rank if it was idle while a list it belongs to was updated times. It is quite obvious that this can be achieved by setting parameter to . For example, if then (cf. [10])
Close relationship between Algorithm 1 and Bernoulli convolutions (cf. [4]) is a subject of the rest of this paper.
2 Bernoulli convolutions
Suppose that incoming event frequencies follow a fixed discrete distribution and let be a probability distribution vector ( for all ) of our (relative) frequency estimates at times . Essentially, Algorithm 1 computes a path of a random walk on a standard -dimensional simplex defined by iterative rule
[TABLE]
where is an -th vertex of the simplex or, in other words, the -th unit vector in standard Eucledean coordinates in . The update rule for the -th coordinate on iteration is
[TABLE]
Let’s fix a coordinate for a while, omitting the index . Let be random biased Bernoulli variables such that and . It is well known (see. e.g. [4]) that on step the one-dimensional random walk (4) corresponds to a random variable
[TABLE]
which up to a mostly irrelevant free term is a convolution of biased Bernoulli variables. The infinite biased Bernoulli convolution (cf. e.g. [4]) is obtained from (3) by setting or similarly, by driving the random process (4) infinite number of steps.
Remark 2
It is well known (see e.g. [5] for precise statement) that Bernoulli convolution is absolutely continuous (with respect to the Lebesgue measure on the line) for almost all sufficiently large values of parameter . For these values of the weak limit of the sequence of random variables does exit and only this case will be considered in this paper.
Lemma 1
[TABLE]
Indeed, by definition (2)
[TABLE]
and hence by induction
[TABLE]
which is the same as (4).
Lemma 2
[TABLE]
Proof. It follows from the definition (2) that
[TABLE]
and therefore by (5)
[TABLE]
From here, by the same inductive argument as in Lemma 1, we get
[TABLE]
As an obvious consequence of lemmas 1 and 2 (cf. Remark 2) we have
Corollary 1
The infinite Bernoulli convolution defined by (2) has expectation and variance
Remark 3
Under assumption that the sought for limits exist (Remark 2), Corollary 1 can be established by passing to the limit in recurrent relations (5), (7) and then solving for expectation and variance respectfully.
Here is an example, demonstrating that passing to a limit as suggested in Corollary 1 is not always possible.
Example 1
Assuming that starting point of the random walk (2) is non-zero, we have
[TABLE]
Passing here to the limit as yields
[TABLE]
which is obviously wrong if and therefore, the condition is necessary for the existence of continuous limit . If the condition follows from the well known necessary condition for non-singularity of Bernoulli convolution (cf. e.g. [5]). For (8) to be true, however, we need non-singularity of the inverse of Bernoulli convolution. Essentially a question one can ask is this. For what values of (if any) satisfying (8) exists.
3 Random walk on a simplex
We will compute variances of random vectors generated by (1) and some other similar random walks. As before, it is assumed that continuous limit does exist. It follows from (4-5) and Corollary 1 that
[TABLE]
In what follows, all vectors are assumed to be column vectors so that for vectors their outer product is where is a row vector transposition of . A diagonal matrix with elements of a vector on its main diagonal will be denoted by .
Using the rule (1) we get
[TABLE]
In the same way, using (9) we compute
[TABLE]
and subtracting this from (10) we obtain a recurrent relationship
[TABLE]
which is perfectly similar to (7). Hence, in accordance with Lemma 2 we have
Theorem 1
The covariance matrix of the finite -dimensional Bernoulli convolution defined by (1) is
[TABLE]
The covariance matrix of the corresponding infinite -dimensional Bernoulli convolution is
[TABLE]
Let be -vector with all its coordinates being equal to one. It’s easy to check that . This is not surprising since coordinates of sum-up to one. The matrix is a symmetric rank-one perturbation of a diagonal matrix and spectral structure of such matrices is well studied. We just mention
Corollary 2
If bias probabilities are pairwise distinct then all the non-zero eigenvalues of the covariance matrix of -dimensional Bernoulli convolution (1) are distinct roots of the equation
[TABLE]
On the other hand, we have
Example 2
The only eigenvalues of the covariance matrix of unbiased () -dimensional Bernoulli convolution are [math] and
As a slight generalization of (1), fix points (vectors) in and discrete probability distribution . Define a random walk by a rule
[TABLE]
Let be an matrix that has coordinates of as its columns. For random vectors defined by (11), the equation (9) turns into
[TABLE]
Let . From the proof of Theorem 1 we have
Corollary 3
[TABLE]
and in one-dimensional case
Corollary 4
[TABLE]
Note that setting here we not-surprisingly recover equations (4) and (6).
Moreover, consider a case when all points belong to a complex plain. Then is a sequence of complex random variables and again from the proof of Theorem 1 we have
Theorem 2
Let . Then for the sequence of complex random variables defined by (11) we have
[TABLE]
The proof is similar to the proof of Theorem 1. By definition
[TABLE]
and as in the proof of Theorem 1
[TABLE]
On the other hand
[TABLE]
and it follows from (12) that
[TABLE]
Substituting this into previous equation and subtracting from (13) we obtain a recurrent relation
[TABLE]
The rest of the proof is the same as in Lemma 2.
Corollary 5
If all points belong to a unit circle then
[TABLE]
where are pairwise angles between unit vectors .
Indeed, since in this case , we have
[TABLE]
and on the other hand
[TABLE]
Example 3
If then two-dimensional random walk (1) can be viewed as a random walk on an equilateral triangle whose vertices are three distinct cubic roots of unity . All three angles between and are equal to and by Corollary 5 the (complex) variance of the corresponding complex random variable at iteration is
[TABLE]
4 Properties of approximation
Results of the section 2 can be used to evaluate heavy hitters approximation produced by Algorithm 1.
To evaluate the algorithm ability to ”overweight” recent event frequencies, let’s assume that the number of iterations corresponds to a ”relevancy” time window. For example, if last week heavy hitters are of highest importance, let be a ”weekfull of clicks”. Measuring time by click-counter, suppose that estimated click-distribution at the start of the time period was and that for time the incoming click distribution did not change. Suppose also that at time the incoming distribution switched to and did not change for the remaining time . Then by Lemma 1, an expected convex mixture approximation at the end of the time period will be
[TABLE]
To see how our approximation is affected by recent events let’s estimate the ratio of coefficients at and in the expression above. Since is supposed to be small, we have
[TABLE]
In case of plain event counting this ratio should be . On the other hand, from (14) we have
Corollary 6
Algorithm 1 introduces approximately times per iteration ”velocity boost” for recent heavy hitters.
As we saw above, Algorithm 1 will approximate the mean of a fixed incoming click distribution in the long run. Lemmas 1, 2 and a straightforward application of Chebyshev inequality (cf. e.g. [9] for a vector version) give a reasonable estimate for a quality of this approximation.
Corollary 7
The following estimates hold for random variables and for random vector
[TABLE]
In particular,
[TABLE]
and
[TABLE]
Remark 4
It follows from (16) that for any and large enough , about -th of the limit distribution belongs to the narrow interval
Example 4
For and for sufficiently large the value of will belong to the interval with about probability
It is obvious, that the estimator (15) works better for large values of , i.e. for above-mentioned heavy hitters. More precisely, setting in (15) we get
Corollary 8
An estimate
[TABLE]
holds for
[TABLE]
Example 5
For and this boils down to
[TABLE]
In other words, for large enough number of iterations, click probabilities that are slightly above can be approximated up-to relative error with confidence.
For a finite Bernoulli convolutoin obtained after iterations of Algorithm 1 we get from (4) and (6)
Corollary 9
If then for any
[TABLE]
In particular
[TABLE]
and if then
[TABLE]
5 Recurrent formula for moments of biased Bernoulli convolutions
Moments of unbiased Bernoulli convolutions were studied in [6],[7],[8]. Some basic properties of moments of biased infinite Bernoulli convolutions are briefly discussed in this section..
It makes sense to consider central moments, (cf. Corollary 1). Hence, we replace the sequence with the sequence which from now on will be denoted by the same letter. The transformation rule (2) thus changes to
[TABLE]
For expectations of the random variable sequence that tarnslates into
[TABLE]
Opening brackets and passing to the limit (that is assumed to exist) results in identity
[TABLE]
Finally, after relabeling we obtain for a recurrent relation (cf. [7])
[TABLE]
Obviously, and . It is now a simple matter to write down a few central moments of the infinite Bernoulli convolution (2):
Example 6
[TABLE]
Let be a measure associated with the infinite Bernoulli convolution that is generated by rule (17) and let denote a reflection . Denote also by an infinite Bernoulli convolution generated by the rule (17) with interchanged probabilities . It is probably worth mentioning
Corollary 10
.
- (i)
for any interval
- (ii)
* and therefore*
- (iii)
**
- (iv)
as polynomials of , the central moments are semi-invariant with respect to the involution , that is
[TABLE]
Indeed, statements (i) and (ii) follow from definition (17). Statement (iii) follows from (ii) or (iv) and the proof of (iv) is a straightforward induction based on (18).
Moreover, for central moments as polynomials of we have
Corollary 11
* is a polynomial of if n is even and is a polynomial of times if is odd.*
This is an easy consequence of Corollary 10. Just note, that it follows from Corollary 10 (iv) that is divisible by if is odd.
Lemma 3
If then
- (i)
all central moments are non-negative
- (ii)
* for all *
- (iii)
**
Proof. The first statement directly follows from (18). The second statement is obvious. Statement (iii) is just a recollection of a well known fact about a sequence of -norms converging to -norm .
Although random variable is not non-negative, the following still holds
Theorem 3
If then
Proof. The sequence for even numbered central moments is non-decreasing by Hölder’s inequality and converges to by Lemma 3. Hence, for any there is such that
[TABLE]
for all even such that . In particular for all odd and such that . Using this fact, we will show that an estimate similar to (19) holds for any large odd number . Indeed, it follows from (18), Lemma 3 (i) and (19) that for any odd
[TABLE]
It is easy to see, however, that the sum in (20) is equal to
[TABLE]
and therefore for any we can find large enough such that for any odd
[TABLE]
After substituting this into (20) we find that
[TABLE]
which is a desired estimate of for large enough odd .
6 Concluding remarks
As was shown above, relative heavy hitters can be approximated by iterative application of the convex mixture rule (1). Suggested algorithm essentially computes a Bernoulli convolution if and while an incoming click distribution remains fixed. In practice, the stochastic process of incoming events is much more complicated (cf. e.g. Corollary 3). A problem of obtaining similar convex mixture approximation estimates in a general setting of varying incoming click distributions seems to be both hard and interesting.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Graham Cormode, Marios Hadjieleftheriou, ”Time Adaptive Sketches (Ada-Sketches) for Summarizing”, Proceedings of the VLDB Endowment VLDB, Volume 1, Issue 2, August 2008
- 2[2] Anshumali Shrivastava Arnd Christian König, Mikhail Bilenko, Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams, SIGMOD’16, June 26-July 01, 2016, San Francisco, CA, USA
- 3[3] Chen-Yu Hsu, Piotr Indyk, Dina Katabi and Ali Vakilian, ”Learning-Based Frequency Estimation Algorithms”, ICLR 2019
- 4[4] Yuval Peres, Wilhelm Schlag, and Boris Solomyak. Sixty years of Bernoulli convolutions. In Fractal geometry and stochastics, II (Greifswald/Koserow, 1998), volume 46 of Progr. Probab. pages 39–65. Birkhauser, Basel, 2000.
- 5[5] Pablo Shmerkin, ”On The Exceptional Set for Absolute Continuity Of Bernoulli Convolutions”, ar Xiv:1303.3992 v 2, 2003
- 6[6] Pawel J. Szablowski, On Moments of Cantor and Related Distributions, ar Xiv:1403.0386, 2014
- 7[7] Timofeev E. A, Asymptotic Formula for the Moments of Bernoulli Convolutions, Modeling and Analysis of Information Systems, 23:2, 185-194, 2016
- 8[8] C. Escribano, M.A. Sastre, E. Torrano, Moments of infinite convolutions of symmetric Bernoulli distributions, Journal of Computational and Applied Mathematics 153 (2003), 191 – 199
