Heavy Hitters and Bernoulli Convolutions

Alexander Kushkuley

arXiv:1905.08930·math.NA·May 29, 2019

Heavy Hitters and Bernoulli Convolutions

Alexander Kushkuley

PDF

Open Access

TL;DR

The paper introduces a simple, event-sensitive frequency approximation algorithm that models event distributions as biased Bernoulli convolutions, enabling analysis of their moments and self-similarity properties.

Contribution

It presents a novel event frequency algorithm that links to biased Bernoulli convolutions, providing new insights into their moments and self-similarity.

Findings

01

Algorithm effectively models event distributions as Bernoulli convolutions.

02

Estimation of moments for biased Bernoulli convolutions is demonstrated.

03

Self-similarity properties are identified under certain conditions.

Abstract

A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard $n$ -dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions.

Equations125

Y_{t + 1} = α Y_{t} + (1 - α) δ_{i} with probability q_{i}, i = 1, 2, \dots, n

Y_{t + 1} = α Y_{t} + (1 - α) δ_{i} with probability q_{i}, i = 1, 2, \dots, n

\displaystyle y_{i,t+1}=\begin{array}[]{ll}\alpha y_{i,t}&\text{ with probability }1-q_{i}\\ \alpha y_{i,t}+1-\alpha&\text{ with probability }q_{i}\end{array}

\displaystyle y_{i,t+1}=\begin{array}[]{ll}\alpha y_{i,t}&\text{ with probability }1-q_{i}\\ \alpha y_{i,t}+1-\alpha&\text{ with probability }q_{i}\end{array}

y_{t} = α^{t} y_{0} + (1 - α) m = 0 \sum t ξ_{m} α^{m}

y_{t} = α^{t} y_{0} + (1 - α) m = 0 \sum t ξ_{m} α^{m}

E (y_{t}) = α^{t} y_{0} + (1 - α^{t}) q

E (y_{t}) = α^{t} y_{0} + (1 - α^{t}) q

E (y_{t}) = α E (y_{t - 1}) + (1 - α) q

E (y_{t}) = α E (y_{t - 1}) + (1 - α) q

E (y_{t}) = α^{t} y_{0} + (1 - α) (1 + α + \dots + α^{t - 1}) q

E (y_{t}) = α^{t} y_{0} + (1 - α) (1 + α + \dots + α^{t - 1}) q

VAR (y_{t}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (q - q^{2})

VAR (y_{t}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (q - q^{2})

E (y_{t}^{2}) = α^{2} E (y_{t - 1}^{2}) + (1 - α)^{2} q + 2 α (1 - α) E (y_{t - 1}) q

E (y_{t}^{2}) = α^{2} E (y_{t - 1}^{2}) + (1 - α)^{2} q + 2 α (1 - α) E (y_{t - 1}) q

VAR (y_{t}) = E (y_{t}^{2}) - E (y_{t})^{2} = α^{2} VAR (y_{t - 1}) + (1 - α)^{2} (q - q^{2})

VAR (y_{t}) = E (y_{t}^{2}) - E (y_{t})^{2} = α^{2} VAR (y_{t - 1}) + (1 - α)^{2} (q - q^{2})

VAR (y_{t}) = \frac{( 1 - α ^{2 t} )}{1 - α ^{2}} (1 - α)^{2} (q - q^{2}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (q - q^{2})

VAR (y_{t}) = \frac{( 1 - α ^{2 t} )}{1 - α ^{2}} (1 - α)^{2} (q - q^{2}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (q - q^{2})

E (\frac{1}{y _{t + 1}}) = (1 - q) E (\frac{1}{α y _{t}}) + q E (\frac{1}{α y _{t} + 1 - α}) =

E (\frac{1}{y _{t + 1}}) = (1 - q) E (\frac{1}{α y _{t}}) + q E (\frac{1}{α y _{t} + 1 - α}) =

= \frac{1 - q}{α} E (\frac{1}{α y _{t}}) + q E (\frac{1}{1 - α ( 1 - y _{t} )})

\frac{α - 1 + q}{α} E (\frac{1}{y}) = q E (\frac{1}{1 - α ( 1 - y )})

\frac{α - 1 + q}{α} E (\frac{1}{y}) = q E (\frac{1}{1 - α ( 1 - y )})

E (Y_{t + 1}) = α E (Y_{t}) + (1 - α) Q

E (Y_{t + 1}) = α E (Y_{t}) + (1 - α) Q

E (Y_{t}) = α^{t} Y_{0} + (1 - α^{t}) Q

E (Y) = Q

E (Y_{t + 1} Y_{t + 1}^{T}) = α^{2} E (Y_{t} Y_{t}^{T}) + (1 - α)^{2} i = 1 \sum n q_{i} δ_{i} δ_{i}^{T} +

E (Y_{t + 1} Y_{t + 1}^{T}) = α^{2} E (Y_{t} Y_{t}^{T}) + (1 - α)^{2} i = 1 \sum n q_{i} δ_{i} δ_{i}^{T} +

+ α (1 - α) i = 1 \sum n q_{i} (E (Y_{t} δ_{i}^{T}) + E (δ_{i} Y_{t}^{T})) =

= α^{2} E (Y_{t} Y_{t}^{T}) + (1 - α)^{2} d ia g (Q) + α (1 - α) (E (Y_{t}) Q^{T} + Q E (Y_{t}^{T}))

E (Y_{t + 1}) E (Y_{t + 1}^{T}) = α^{2} E (Y_{t}) E (Y_{t}^{T}) + (1 - α)^{2} Q Q^{T} + α (1 - α) (E (Y_{t}) Q^{T} + Q E (Y_{t}^{T}))

E (Y_{t + 1}) E (Y_{t + 1}^{T}) = α^{2} E (Y_{t}) E (Y_{t}^{T}) + (1 - α)^{2} Q Q^{T} + α (1 - α) (E (Y_{t}) Q^{T} + Q E (Y_{t}^{T}))

VAR (Y_{t + 1}) = α^{2} VAR (Y_{t}) + (1 - α)^{2} (d ia g (Q) - Q Q^{T})

VAR (Y_{t + 1}) = α^{2} VAR (Y_{t}) + (1 - α)^{2} (d ia g (Q) - Q Q^{T})

VAR (Y_{t}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (d ia g (Q) - Q Q^{T})

VAR (Y_{t}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (d ia g (Q) - Q Q^{T})

VAR (Y) = \frac{1 - α}{1 + α} (d ia g (Q) - Q Q^{T})

VAR (Y) = \frac{1 - α}{1 + α} (d ia g (Q) - Q Q^{T})

i = 1 \sum n \frac{q _{i}}{q _{i} - λ} = 0

i = 1 \sum n \frac{q _{i}}{q _{i} - λ} = 0

Y_{t + 1}^{'} = α Y_{t}^{'} + (1 - α) v_{i} with probability q_{i}, i = 1, 2, \dots, m

Y_{t + 1}^{'} = α Y_{t}^{'} + (1 - α) v_{i} with probability q_{i}, i = 1, 2, \dots, m

E (Y_{t + 1}^{'}) = α E (Y_{t}^{'}) + (1 - α) V Q

E (Y_{t + 1}^{'}) = α E (Y_{t}^{'}) + (1 - α) V Q

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) V Q, E (Y^{'}) = V Q

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) V Q, E (Y^{'}) = V Q

VAR (Y_{t}^{'}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} V (d ia g (Q) - Q Q^{T}) V^{T}

VAR (Y^{'}) = \frac{1 - α}{1 + α} V (d ia g (Q) - Q Q^{T}) V^{T}

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) i \sum m v_{i} q_{i}, E (Y^{'}) = i \sum m v_{i} q_{i}

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) i \sum m v_{i} q_{i}, E (Y^{'}) = i \sum m v_{i} q_{i}

VAR (Y_{t}^{'}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} i = 1 \sum n v_{i}^{2} (q_{i} - q_{i}^{2})

VAR (Y^{'}) = \frac{1 - α}{1 + α} i = 1 \sum n v_{i}^{2} (q_{i} - q_{i}^{2})

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) i \sum m v_{i} q_{i}, E (Y^{'}) = i \sum m v_{i} q_{i}

E (Y_{t}^{'}) = α^{t} Y_{0}^{'} + (1 - α^{t}) i \sum m v_{i} q_{i}, E (Y^{'}) = i \sum m v_{i} q_{i}

VAR (Y_{t}^{'}) = (1 - α^{2 t}) \frac{1 - α}{1 + α} (i = 1 \sum n ∣ v_{i} ∣^{2} (q_{i} - q_{i}^{2}) - i < j \sum (v_{i} \overset{v}{ˉ}_{j} + v_{j} \overset{v}{ˉ}_{i}) q_{i} q_{j})

VAR (Y_{t}^{'}) = \frac{1 - α}{1 + α} (i = 1 \sum n ∣ v_{i} ∣^{2} (q_{i} - q_{i}^{2}) - i < j \sum (v_{i} \overset{v}{ˉ}_{j} + v_{j} \overset{v}{ˉ}_{i}) q_{i} q_{j})

VAR (Y_{t}^{'}) = E (Y_{t}^{'} \overset{ˉ}{Y_{t}^{'}})) - E (Y_{t}^{'}) E (\overset{ˉ}{Y_{t}^{'}})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCellular Automata and Applications · Fractal and DNA sequence analysis · Algorithms and Data Compression

Full text

Heavy Hitters and Bernoulli Convolutions

Alexander Kushkuley

(Salesforce/Demandware, [email protected])

Abstract

A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard $n$ -dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions.

1 Introduction

To quote [2], ”there is a need to estimate the count of a given item $i$ (or event or combination thereof) during some period of time $t$ …Typically, items with highest counts, commonly known as heavy hitters, are of most interest”.

This note is an attempt to redefine event counting problem (cf. [1], [2], [3]). In many cases, the most important factor is recent event ”popularity rank” (cf. e.g. [3]) and not its long-run frequency. Hence, instead of $n$ item-event counters consider a time-dependent discrete probability distribution $P=(p_{1},p_{2}\cdots,p_{n})$ as an estimate for relative frequencies (ranks) of the items involved. An occurrence of an event with index $i$ can be represented by a delta function distribution $\delta_{i}$ on the set $\{1,\cdots,n\}$ triggering an update of estimated probability distribution $P$ by an application of a convex mixture rule $P\rightarrow\alpha P+(1-\alpha)\delta_{i}$ . In other words, arrival of an event $i$ reduces ranks of all other events while tilting estimated event rank-distribution towards event-item $i$ in a simplest way possible. Thus we arrive at the following heavy hitters approximation algorithm

Algorithm 1

Fix a number $\alpha<1$ that is close to $1$ . If an item $j\in\{1,2,\cdots,n\}$ , was clicked (event number $j$ did occur) set $p_{i}\rightarrow\alpha p_{i},\;i=1,\cdots n,\;i\neq j$ and set $p_{j}\rightarrow\alpha p_{j}+1-\alpha$

One practical problem with the above is that all frequencies (probabilities) are updated simultaneously. There are, however, some advantages:

(1)

decreasing $\alpha$ gives higher priority to recent events and vice-versa, increasing $\alpha$ will bias the ranking towards ”idling” event items

(2)

and therefore, sensitivity of this ranking scheme to new events can be easily controlled (even at runtime) by adjusting just one parameter

Remark 1

Suppose that it is desirable that an item should loose half of its rank if it was idle while a list it belongs to was updated $T$ times. It is quite obvious that this can be achieved by setting parameter $\alpha$ to $\exp(-\log(2)/T)$ . For example, if $T=10$ then $\alpha\approx.93$ (cf. [10])

Close relationship between Algorithm 1 and Bernoulli convolutions (cf. [4]) is a subject of the rest of this paper.

2 Bernoulli convolutions

Suppose that incoming event frequencies follow a fixed discrete distribution $Q=(q_{1},q_{2},\cdots q_{n}),\;\sum_{i=1}^{n}q_{i}=1$ and let $Y_{t}=(y_{1,t}\;,\cdots,y_{n,t})$ be a probability distribution vector ( $\sum_{i=1}^{n}y_{i,t}=1$ for all $t$ ) of our (relative) frequency estimates at times $t=0,1,\cdots$ . Essentially, Algorithm 1 computes a path of a random walk on a standard $(n-1)$ -dimensional simplex $\sigma^{n-1}\in\mathbb{R}^{n}$ defined by iterative rule

[TABLE]

where $\delta_{i}$ is an $i$ -th vertex of the simplex $\sigma^{n-1}$ or, in other words, the $i$ -th unit vector in standard Eucledean coordinates in $\mathbb{R}^{n}$ . The update rule for the $i$ -th coordinate on iteration $t+1$ is

[TABLE]

Let’s fix a coordinate for a while, omitting the index $i$ . Let $\xi_{m},\;m=1,\cdots,t$ be random biased Bernoulli variables such that $\mathbb{P}(\xi_{m}=0)=1-q$ and $\mathbb{P}(\xi_{m}=1)=q$ . It is well known (see. e.g. [4]) that on step $t$ the one-dimensional random walk (4) corresponds to a random variable

[TABLE]

which up to a mostly irrelevant free term is a convolution of $t$ biased Bernoulli variables. The infinite biased Bernoulli convolution (cf. e.g. [4]) is obtained from (3) by setting $t=\infty$ or similarly, by driving the random process (4) infinite number of steps.

Remark 2

It is well known (see e.g. [5] for precise statement) that Bernoulli convolution $(1-\alpha)\sum_{0}^{\infty}\xi_{m}\alpha^{m}$ is absolutely continuous (with respect to the Lebesgue measure on the line) for almost all sufficiently large values of parameter $\alpha$ . For these values of $\alpha$ the weak limit $y$ of the sequence of random variables $y_{t}$ does exit and only this case will be considered in this paper.

Lemma 1

[TABLE]

Indeed, by definition (2)

[TABLE]

and hence by induction

[TABLE]

which is the same as (4).

Lemma 2

[TABLE]

Proof. It follows from the definition (2) that

[TABLE]

and therefore by (5)

[TABLE]

From here, by the same inductive argument as in Lemma 1, we get

[TABLE]

As an obvious consequence of lemmas 1 and 2 (cf. Remark 2) we have

Corollary 1

The infinite Bernoulli convolution defined by (2) has expectation $q$ and variance $\frac{1-\alpha}{1+\alpha}(q-q^{2})$

Remark 3

Under assumption that the sought for limits exist (Remark 2), Corollary 1 can be established by passing to the limit in recurrent relations (5), (7) and then solving for expectation and variance respectfully.

Here is an example, demonstrating that passing to a limit as suggested in Corollary 1 is not always possible.

Example 1

Assuming that starting point of the random walk (2) is non-zero, we have

[TABLE]

Passing here to the limit as $t\rightarrow\infty$ yields

[TABLE]

which is obviously wrong if $\alpha\leq 1-q$ and therefore, the condition $\alpha>1-q$ is necessary for the existence of continuous limit $\lim_{t\rightarrow\infty}1/y_{t}$ . If $q>1-q$ the condition $\alpha>1-q$ follows from the well known necessary condition $\alpha>q^{q}(1-q)^{1-q}$ for non-singularity of Bernoulli convolution $\lim_{t\rightarrow\infty}y_{t}$ (cf. e.g. [5]). For (8) to be true, however, we need non-singularity of the inverse of Bernoulli convolution. Essentially a question one can ask is this. For what values of $\alpha$ (if any) $\lim_{t\rightarrow\infty}1/y_{t}$ satisfying (8) exists.

3 Random walk on a simplex

We will compute variances of random vectors generated by (1) and some other similar random walks. As before, it is assumed that continuous limit $Y=\lim_{t\rightarrow\infty}Y_{t}$ does exist. It follows from (4-5) and Corollary 1 that

[TABLE]

In what follows, all vectors are assumed to be column vectors so that for vectors $A,B$ their outer product is $AB^{T}$ where $B^{T}$ is a row vector transposition of $B$ . A diagonal matrix with elements of a vector $A$ on its main diagonal will be denoted by $diag(A)$ .

Using the rule (1) we get

[TABLE]

In the same way, using (9) we compute

[TABLE]

and subtracting this from (10) we obtain a recurrent relationship

[TABLE]

which is perfectly similar to (7). Hence, in accordance with Lemma 2 we have

Theorem 1

The covariance matrix of the finite $n$ -dimensional Bernoulli convolution defined by (1) is

[TABLE]

The covariance matrix of the corresponding infinite $n$ -dimensional Bernoulli convolution is

[TABLE]

Let $1_{n}$ be $n$ -vector with all its coordinates being equal to one. It’s easy to check that $\mathbb{VAR}(Y_{t})(1_{n})=\mathbb{VAR}(Y)(1_{n})=0$ . This is not surprising since coordinates of $Y_{t}$ sum-up to one. The matrix $diag(Q)-QQ^{T}$ is a symmetric rank-one perturbation of a diagonal matrix and spectral structure of such matrices is well studied. We just mention

Corollary 2

If bias probabilities $q_{i}$ are pairwise distinct then all the non-zero eigenvalues of the covariance matrix of $n$ -dimensional Bernoulli convolution (1) are distinct roots of the equation

[TABLE]

On the other hand, we have

Example 2

The only eigenvalues of the covariance matrix of unbiased ( $q_{i}=1/n,\;i=1,\cdots,n$ ) $n$ -dimensional Bernoulli convolution are [math] and $1/n$

As a slight generalization of (1), fix $m>1$ points (vectors) $v_{1},\cdots v_{m}$ in $\mathbb{R}^{n}$ and discrete probability distribution $Q=(q_{1},q_{2},\cdots q_{m})$ . Define a random walk by a rule

[TABLE]

Let $V$ be an $n\times m$ matrix that has coordinates of $v_{1},\cdots,v_{m}$ as its columns. For random vectors defined by (11), the equation (9) turns into

[TABLE]

Let $Y^{\prime}=\lim_{t\rightarrow\infty}Y^{\prime}_{t}$ . From the proof of Theorem 1 we have

Corollary 3

[TABLE]

and in one-dimensional case

Corollary 4

[TABLE]

Note that setting here $m=2,v_{1}=0,v_{2}=1$ we not-surprisingly recover equations (4) and (6).

Moreover, consider a case when all points $v_{1},\cdots,v_{m}$ belong to a complex plain. Then $Y^{\prime}_{t},\;t=1,2,\cdots$ is a sequence of complex random variables and again from the proof of Theorem 1 we have

Theorem 2

Let $v_{1},\cdots,v_{m}\in\mathbb{C}^{1},\;m>1$ . Then for the sequence of complex random variables $Y^{\prime}_{t},\;t=1,\cdots$ defined by (11) we have

[TABLE]

The proof is similar to the proof of Theorem 1. By definition

[TABLE]

and as in the proof of Theorem 1

[TABLE]

On the other hand

[TABLE]

and it follows from (12) that

[TABLE]

Substituting this into previous equation and subtracting from (13) we obtain a recurrent relation

[TABLE]

The rest of the proof is the same as in Lemma 2.

Corollary 5

If all points $v_{i}\in\mathbb{C},\;i=1,2,\cdots,m,\;m>1$ belong to a unit circle then

[TABLE]

where $\phi_{i,j},\;i<j$ are pairwise angles between unit vectors $v_{i},v_{j}$ .

Indeed, since in this case $|v_{i}|=1,\;i=1,\cdots m$ , we have

[TABLE]

and on the other hand

[TABLE]

Example 3

If $m=n=3$ then two-dimensional random walk (1) can be viewed as a random walk on an equilateral triangle $\sigma^{2}$ whose vertices are three distinct cubic roots of unity $v_{1}=1,v_{2}=e^{2\pi i/3},v_{3}=e^{4\pi i/3}$ . All three angles between $v_{i}$ and $v_{j},\;i,j=1,2,3,\;i\neq j$ are equal to $2\pi/3$ and by Corollary 5 the (complex) variance of the corresponding complex random variable at iteration $t$ is

[TABLE]

4 Properties of approximation

Results of the section 2 can be used to evaluate heavy hitters approximation produced by Algorithm 1.

To evaluate the algorithm ability to ”overweight” recent event frequencies, let’s assume that the number of iterations $t$ corresponds to a ”relevancy” time window. For example, if last week heavy hitters are of highest importance, let $t$ be a ”weekfull of clicks”. Measuring time by click-counter, suppose that estimated click-distribution at the start of the time period was $X$ and that for time $t_{1}$ the incoming click distribution $P_{1}$ did not change. Suppose also that at time $t_{1}$ the incoming distribution switched to $P_{2}$ and did not change for the remaining time $t_{2}=t-t_{1}$ . Then by Lemma 1, an expected convex mixture approximation at the end of the time period will be

[TABLE]

To see how our approximation is affected by recent events let’s estimate the ratio of coefficients at $P_{2}$ and $P_{1}$ in the expression above. Since $\beta=1-\alpha$ is supposed to be small, we have

[TABLE]

In case of plain event counting this ratio should be $\approx t_{2}/t_{1}$ . On the other hand, from (14) we have

Corollary 6

Algorithm 1 introduces approximately times $\alpha^{-1}$ per iteration ”velocity boost” for recent heavy hitters.

As we saw above, Algorithm 1 will approximate the mean of a fixed incoming click distribution in the long run. Lemmas 1, 2 and a straightforward application of Chebyshev inequality (cf. e.g. [9] for a vector version) give a reasonable estimate for a quality of this approximation.

Corollary 7

The following estimates hold for random variables $y_{i}=\lim_{t\rightarrow\infty}y_{i,t}$ and for random vector $Y=\lim_{t\rightarrow\infty}Y_{t}$

[TABLE]

In particular,

[TABLE]

and

[TABLE]

Remark 4

It follows from (16) that for any $q_{i}$ and large enough $\alpha$ , about $(7/8)$ -th of the limit distribution belongs to the narrow interval $[-\sqrt{1-\alpha},\sqrt{1-\alpha}]$

Example 4

For $\alpha=0.99,\;q=1/2$ and for sufficiently large $t$ the value of $y_{t}$ will belong to the interval $[0.4,\;0.5]$ with about $87\%$ probability

It is obvious, that the estimator (15) works better for large values of $q$ , i.e. for above-mentioned heavy hitters. More precisely, setting $\epsilon\leftarrow\epsilon q_{i}$ in (15) we get

Corollary 8

An estimate

[TABLE]

holds for

[TABLE]

Example 5

For $\epsilon=1/10$ and $\alpha=1-\epsilon^{3}=.999$ this boils down to

[TABLE]

In other words, for large enough number of iterations, click probabilities that are slightly above $1/3$ can be approximated up-to $10\%$ relative error with $90\%$ confidence.

For a finite Bernoulli convolutoin obtained after $t$ iterations of Algorithm 1 we get from (4) and (6)

Corollary 9

If $y_{i,0}=q_{i},\;i=1,\cdots n$ then for any $t=1,2,\cdots$

[TABLE]

In particular

[TABLE]

and if $Y_{0}=Q$ then

[TABLE]

5 Recurrent formula for moments of biased Bernoulli convolutions

Moments of unbiased Bernoulli convolutions were studied in [6],[7],[8]. Some basic properties of moments of biased infinite Bernoulli convolutions are briefly discussed in this section..

It makes sense to consider central moments, $\mathbb{E}(y-q)^{n}$ (cf. Corollary 1). Hence, we replace the sequence $y_{t}$ with the sequence $y_{t}-q$ which from now on will be denoted by the same letter. The transformation rule (2) thus changes to

[TABLE]

For expectations of the random variable sequence $y_{m}^{n},\;m=1,2,\cdots\;$ that tarnslates into

[TABLE]

Opening brackets and passing to the limit (that is assumed to exist) results in identity

[TABLE]

Finally, after relabeling $M_{k}=\mathbb{E}(y^{k})$ we obtain for $n>=2$ a recurrent relation (cf. [7])

[TABLE]

Obviously, $M_{0}=1$ and $M_{1}=0$ . It is now a simple matter to write down a few central moments of the infinite Bernoulli convolution (2):

Example 6

[TABLE]

Let $\mu_{y}$ be a measure associated with the infinite Bernoulli convolution $y$ that is generated by rule (17) and let $(.)^{*}$ denote a reflection $x\rightarrow 1-x$ . Denote also by $y^{*}$ an infinite Bernoulli convolution generated by the rule (17) with interchanged probabilities $q\rightarrow 1-q$ . It is probably worth mentioning

Corollary 10

.

(i)

for any interval $[a,b],\;\mu_{y^{*}}([a,b]^{*})=\mu_{y}([a,b])$

(ii)

$y^{*}=-y$ * and therefore*

(iii)

$\mathbb{E}(y^{n})=(-1)^{n}\;\mathbb{E}(y^{*n}),\;\;n=0,1,2\cdots$ **

(iv)

as polynomials of $q$ , the central moments $M_{n}(q)\equiv\;\mathbb{E}(y^{n})(q)$ are semi-invariant with respect to the involution $\tau:q\rightarrow 1-q$ , that is

[TABLE]

Indeed, statements (i) and (ii) follow from definition (17). Statement (iii) follows from (ii) or (iv) and the proof of (iv) is a straightforward induction based on (18).

Moreover, for central moments $M_{n}\equiv M_{n}(q)$ as polynomials of $q$ we have

Corollary 11

$M_{n}(q)$ * is a polynomial of $q(1-q)$ if n is even and is a polynomial of $q(1-q)$ times $1-2q$ if $n$ is odd.*

This is an easy consequence of Corollary 10. Just note, that it follows from Corollary 10 (iv) that $M_{n}(q)$ is divisible by $q-\frac{1}{2}$ if $n$ is odd.

Lemma 3

If $q\leq 1-q$ then

(i)

all central moments $M_{n}$ are non-negative

(ii)

$M_{n}\leq 1-q$ * for all $n=0,1,\cdots$ *

(iii)

$\lim_{n\rightarrow\infty}M_{2n}^{1/2n}=1-q$ **

Proof. The first statement directly follows from (18). The second statement is obvious. Statement (iii) is just a recollection of a well known fact about a sequence of $n$ -norms $\left(\int_{-q}^{1-q}|y(x)|^{n}d\mu_{y}(x)\right)^{1/n}$ converging to $\infty$ -norm $\max\{|y|\}=1-q$ .

Although random variable $y$ is not non-negative, the following still holds

Theorem 3

If $q\leq 1-q$ then $\lim_{n\rightarrow\infty}M_{n}^{1/n}=1-q$

Proof. The sequence $M_{n}^{1/n},\;n=0,2,\cdots$ for even numbered central moments is non-decreasing by Hölder’s inequality and converges to $1-q$ by Lemma 3. Hence, for any $\epsilon_{1}>0$ there is $k=k_{0}$ such that

[TABLE]

for all even $n$ such that $n\geq k_{0}$ . In particular $M_{n-k}\geq(1-q-\epsilon_{1})^{n-k}$ for all odd $k$ and $n$ such that $k\leq n-k_{0}$ . Using this fact, we will show that an estimate similar to (19) holds for any large odd number $n$ . Indeed, it follows from (18), Lemma 3 (i) and (19) that for any odd $n>k_{0}+2$

[TABLE]

It is easy to see, however, that the sum in (20) is equal to

[TABLE]

and therefore for any $\epsilon_{2}>0$ we can find large enough $n_{0}$ such that for any odd $n>n_{0}$

[TABLE]

After substituting this into (20) we find that

[TABLE]

which is a desired estimate of $M_{n}$ for large enough odd $n$ .

6 Concluding remarks

As was shown above, relative heavy hitters can be approximated by iterative application of the convex mixture rule (1). Suggested algorithm essentially computes a Bernoulli convolution if and while an incoming click distribution remains fixed. In practice, the stochastic process of incoming events is much more complicated (cf. e.g. Corollary 3). A problem of obtaining similar convex mixture approximation estimates in a general setting of varying incoming click distributions seems to be both hard and interesting.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Graham Cormode, Marios Hadjieleftheriou, ”Time Adaptive Sketches (Ada-Sketches) for Summarizing”, Proceedings of the VLDB Endowment VLDB, Volume 1, Issue 2, August 2008
2[2] Anshumali Shrivastava Arnd Christian König, Mikhail Bilenko, Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams, SIGMOD’16, June 26-July 01, 2016, San Francisco, CA, USA
3[3] Chen-Yu Hsu, Piotr Indyk, Dina Katabi and Ali Vakilian, ”Learning-Based Frequency Estimation Algorithms”, ICLR 2019
4[4] Yuval Peres, Wilhelm Schlag, and Boris Solomyak. Sixty years of Bernoulli convolutions. In Fractal geometry and stochastics, II (Greifswald/Koserow, 1998), volume 46 of Progr. Probab. pages 39–65. Birkhauser, Basel, 2000.
5[5] Pablo Shmerkin, ”On The Exceptional Set for Absolute Continuity Of Bernoulli Convolutions”, ar Xiv:1303.3992 v 2, 2003
6[6] Pawel J. Szablowski, On Moments of Cantor and Related Distributions, ar Xiv:1403.0386, 2014
7[7] Timofeev E. A, Asymptotic Formula for the Moments of Bernoulli Convolutions, Modeling and Analysis of Information Systems, 23:2, 185-194, 2016
8[8] C. Escribano, M.A. Sastre, E. Torrano, Moments of infinite convolutions of symmetric Bernoulli distributions, Journal of Computational and Applied Mathematics 153 (2003), 191 – 199