Minimax Risk for Missing Mass Estimation

Nikhilesh Rajaraman; Andrew Thangaraj; Ananda Theertha Suresh

arXiv:1705.05006·cs.IT·May 16, 2017

Minimax Risk for Missing Mass Estimation

Nikhilesh Rajaraman, Andrew Thangaraj, Ananda Theertha Suresh

PDF

TL;DR

This paper analyzes the minimax risk in missing mass estimation, providing bounds for the worst-case risk of the Good-Turing estimator and establishing a lower bound for the minimax risk, with implications for practical and theoretical applications.

Contribution

It presents the first known bounds on the minimax risk for missing mass estimation, including the worst-case risk of the Good-Turing estimator and a new lower bound.

Findings

01

Good-Turing estimator risk between 0.6080/n and 0.6179/n

02

Minimax risk lower bounded by 0.25/n

03

First published minimax risk bounds for missing mass estimation

Abstract

The problem of estimating the missing mass or total probability of unseen elements in a sequence of $n$ random samples is considered under the squared error loss function. The worst-case risk of the popular Good-Turing estimator is shown to be between $0.6080/ n$ and $0.6179/ n$ . The minimax risk is shown to be lower bounded by $0.25/ n$ . This appears to be the first such published result on minimax risk for estimation of missing mass, which has several practical and theoretical applications.

Equations113

M_{0} (X^{n}) ≜ u \in X \sum p (u) I (N_{u} (X^{n}) = 0),

M_{0} (X^{n}) ≜ u \in X \sum p (u) I (N_{u} (X^{n}) = 0),

R_{n} (\hat{M}_{0}, p) ≜ E_{X^{n} \sim p} [(\hat{M}_{0} (X^{n}) - M_{0} (X^{n}))^{2}],

R_{n} (\hat{M}_{0}, p) ≜ E_{X^{n} \sim p} [(\hat{M}_{0} (X^{n}) - M_{0} (X^{n}))^{2}],

R_{n} (\hat{M}_{0}) ≜ p max R_{n} (\hat{M}_{0}, p),

R_{n} (\hat{M}_{0}) ≜ p max R_{n} (\hat{M}_{0}, p),

R_{n}^{*} = \hat{M}_{0} min R_{n} (\hat{M}_{0}) .

R_{n}^{*} = \hat{M}_{0} min R_{n} (\hat{M}_{0}) .

Φ_{i} (X^{n}) ≜ u \in X \sum I (N_{u} (X^{n}) = i)

Φ_{i} (X^{n}) ≜ u \in X \sum I (N_{u} (X^{n}) = i)

M^{GT} (X^{n}) ≜ \frac{Φ _{1} ( X ^{n} )}{n} .

M^{GT} (X^{n}) ≜ \frac{Φ _{1} ( X ^{n} )}{n} .

E [M^{GT} (X^{n}) - M_{0} (X^{n})] \leq \frac{1}{n} .

E [M^{GT} (X^{n}) - M_{0} (X^{n})] \leq \frac{1}{n} .

M^{GT} (X^{n}) - M_{0} (X^{n}) \leq \frac{2}{n} + \frac{2 ln ( 3/ δ )}{n} (1 + 2 ln (3 n / δ)) .

M^{GT} (X^{n}) - M_{0} (X^{n}) \leq \frac{2}{n} + \frac{2 ln ( 3/ δ )}{n} (1 + 2 ln (3 n / δ)) .

R_{n} (M^{GT}, p)

R_{n} (M^{GT}, p)

\frac{0.6080}{n} + o (\frac{1}{n}) \leq R_{n} (M^{GT}) \leq \frac{0.6179}{n} + o (\frac{1}{n}) .

\frac{0.6080}{n} + o (\frac{1}{n}) \leq R_{n} (M^{GT}) \leq \frac{0.6179}{n} + o (\frac{1}{n}) .

R_{n}^{*} \geq \frac{4}{27 n} .

R_{n}^{*} \geq \frac{4}{27 n} .

R_{n}^{*} \geq \frac{1}{4 n} + o (\frac{1}{n})

R_{n}^{*} \geq \frac{1}{4 n} + o (\frac{1}{n})

\frac{0.25}{n} + o (\frac{1}{n}) \leq R_{n}^{*} \leq \frac{0.6179}{n} + o (\frac{1}{n}),

\frac{0.25}{n} + o (\frac{1}{n}) \leq R_{n}^{*} \leq \frac{0.6179}{n} + o (\frac{1}{n}),

(M^{GT} (X^{n}) - M_{0} (X^{n}))^{2}

(M^{GT} (X^{n}) - M_{0} (X^{n}))^{2}

= (u \in X \sum \frac{1}{n} I (N_{u} = 1) - p (u) I (N_{u} = 0))

(v \in X \sum \frac{1}{n} I (N_{v} = 1) - p (v) I (N_{v} = 0))

\displaystyle\quad=\frac{1}{n^{2}}\sum_{u,v\in\mathcal{X}}\bigg{(}\mathbb{I}(N_{u}=1)\mathbb{I}(N_{v}=1)

- 2 n p (u) I (N_{u} = 0) I (N_{v} = 1)

\displaystyle\qquad\qquad\qquad+n^{2}p(u)p(v)\mathbb{I}(N_{u}=0)\mathbb{I}(N_{v}=0)\bigg{)}

R_{n} (M^{GT}, p)

R_{n} (M^{GT}, p)

\displaystyle\qquad\qquad\qquad+n^{2}p(u)p(v)P_{n}(0,0)\bigg{)}.

P_{n} (i, j) = ⎩ ⎨ ⎧ (i j n) p (u)^{i} p (v)^{j} (1 - p (u) - p (v))^{n - i - j}, u \neq = v, (i n) p (u)^{i} (1 - p (u)^{n - i}, u = v, i = j,

P_{n} (i, j) = ⎩ ⎨ ⎧ (i j n) p (u)^{i} p (v)^{j} (1 - p (u) - p (v))^{n - i - j}, u \neq = v, (i n) p (u)^{i} (1 - p (u)^{n - i}, u = v, i = j,

p (u) p (v) P_{n} (0, 0)

p (u) p (v) P_{n} (0, 0)

p (u) P_{n} (0, 1)

P_{n} (1, 1)

\displaystyle R_{n}(M^{\textrm{GT}},p)=\frac{1}{n}\sum_{\begin{subarray}{c}u,v\in\mathcal{X}\\ v\neq u\end{subarray}}P(u,v)\bigg{[}n\big{(}p(u)+p(v)\big{)}^{2}-1\bigg{]}

\displaystyle R_{n}(M^{\textrm{GT}},p)=\frac{1}{n}\sum_{\begin{subarray}{c}u,v\in\mathcal{X}\\ v\neq u\end{subarray}}P(u,v)\bigg{[}n\big{(}p(u)+p(v)\big{)}^{2}-1\bigg{]}

\displaystyle\ +\frac{1}{n}\sum_{u\in\mathcal{X}}\bigg{[}p(u)(1-p(u))^{n-1}+np(u)^{2}(1-p(u))^{n}\bigg{]}.

u, v \in X, u \neq = v \sum p (u)^{i} p (v)^{j} (1 - p (u) - p (v))^{n} \leq \frac{( i - 1 )! ( j - 1 )! n !}{( n + i + j - 2 )!} .

u, v \in X, u \neq = v \sum p (u)^{i} p (v)^{j} (1 - p (u) - p (v))^{n} \leq \frac{( i - 1 )! ( j - 1 )! n !}{( n + i + j - 2 )!} .

T (u, v) = (i - 1 j - 1 n + i + j - 2) p (u)^{i - 1} p (v)^{j - 1} (1 - p (u) - p (v))^{n} .

T (u, v) = (i - 1 j - 1 n + i + j - 2) p (u)^{i - 1} p (v)^{j - 1} (1 - p (u) - p (v))^{n} .

E [T (X, Y)] = u \neq = v u , v \in X \sum p (u) p (v) T (u, v)

E [T (X, Y)] = u \neq = v u , v \in X \sum p (u) p (v) T (u, v)

= u \neq = v u , v \in X \sum (i - 1 j - 1 n + i + j - 2) p (u)^{i} p (v)^{j} (1 - p (u) - p (v))^{n} \leq 1,

u \in X \sum p (u)^{i} (1 - p (u))^{n} \leq \frac{( i - 1 )! n !}{( n + i - 1 )!} .

u \in X \sum p (u)^{i} (1 - p (u))^{n} \leq \frac{( i - 1 )! n !}{( n + i - 1 )!} .

u, v \in X, u \neq = v \sum P (u, v) (p (u) + p (v))^{2} = o (1/ n) .

u, v \in X, u \neq = v \sum P (u, v) (p (u) + p (v))^{2} = o (1/ n) .

R_{n} (M^{GT}, p)

R_{n} (M^{GT}, p)

\displaystyle\ +\sum_{u\in\mathcal{X}}np(u)^{2}(1-p(u))^{n}\bigg{]}+o(1/n).

u \in X \sum p (u) (1 - p (u))^{n - 1}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Minimax Risk for Missing Mass Estimation

Nikhilesh Rajaraman, Andrew Thangaraj

Department of Electrical Engineering

Indian Institute of Technology Madras

Chennai 600036, India

[email protected]

Ananda Theertha Suresh

Google Research

New York, USA

[email protected]

Abstract

The problem of estimating the missing mass or total probability of unseen elements in a sequence of $n$ random samples is considered under the squared error loss function. The worst-case risk of the popular Good-Turing estimator is shown to be between $0.6080/n$ and $0.6179/n$ . The minimax risk is shown to be lower bounded by $0.25/n$ . This appears to be the first such published result on minimax risk for estimation of missing mass, which has several practical and theoretical applications.

I Introduction

Given independent samples from an unknown distribution, missing mass estimation asks for the sum of the probability of the unseen elements. Missing mass estimation is a basic problem in statistics and has wide applications in several fields ranging from language modeling [1, 2] to ecology [3]. Perhaps the most used missing mass estimator is the Good-Turing estimator which was proposed in a seminal paper by I. J. Good and Alan Turing in 1953 [4]. The Good-Turing estimator is used in support estimators [3], entropy estimators [5] and unseen species estimators [6]. To describe the estimator and the results, we need a modicum of nomenclature.

Let $p$ be an underlying unknown distribution over an unknown domain $\mathcal{X}$ . Let $X^{n}\triangleq(X_{1},X_{2},\ldots,X_{n})$ be $n$ independent samples from $p$ . For $x\in\mathcal{X}$ , let $N_{x}(X^{n})$ be the number of appearances of $x$ in $X^{n}$ . Upon observing $X^{n}$ , our goal is to estimate the missing mass

[TABLE]

where $\mathbb{I}(\cdot)$ denotes the indicator function. For example, if $\mathcal{X}=\{a,b,c,d\}$ and $X^{3}=b\,c\,b$ , then $M_{0}(X^{3})=p(a)+p(d)$ . The above sampling model for estimation is termed the multinomial model. We note that $1-M_{0}(X^{n})$ is often referred as sample coverage in the literature [7].

An estimator for missing mass $\hat{M}_{0}(X^{n})$ is a mapping from $\mathcal{X}^{n}\to[0,1]$ . For a distribution $p$ , the $\ell^{2}_{2}$ risk of the estimator $\hat{M}_{0}(X^{n})$ is

[TABLE]

and the worst-case risk over all distributions is

[TABLE]

and minimax mean squared loss or minimax risk is

[TABLE]

The goal of this paper is to characterize $R^{*}_{n}$ .

I-A Good-Turing estimator and previous results

Let

[TABLE]

denote the number of symbols that have appeared $i$ times in $X^{n}$ , $1\leq i\leq n$ . For example, if $X^{3}=a,b,c$ , then $\Phi_{1}=3$ and $\Phi_{i}=0$ for all $i>1$ . The Good-Turing estimator [4] for the missing mass is

[TABLE]

One of the first theoretical analysis of the Good-Turing estimator was in [8], where it was shown that

[TABLE]

This shows that the bias of the Good-Turing estimator falls as $1/n$ . They further showed that with probability $\geq 1-\delta$ ,

[TABLE]

Various properties of the Good-Turing estimator and several variations of it have been analyzed for distribution estimation and compression [9, 10, 11, 12, 13, 14, 15]. Several concentration results on missing mass estimation are also known [16, 17]. Despite all this work, the risk of the Good-Turing estimator and the minimax risk of missing mass estimation have still not been conclusively established.

I-B New results

Unlike parameters of a distribution, missing mass itself is a function of the observed sample and that makes finding the exact minimax risk difficult.

We first analyze the risk of the Good-Turing estimator and show that for any distribution $p$ ,

[TABLE]

where $\Phi_{i}$ is abbreviated notation for $\Phi_{i}(X^{n})$ . By maximizing the RHS in the first equation above over all distributions, in Theorem 4, we show that

[TABLE]

We note that under the multinomial model, the numbers of occurrences of symbols are correlated, and this makes finding the worst case distribution for the Good-Turing estimator difficult.

We then prove estimator-independent information-theoretic lower bounds on $R^{*}_{n}$ using two approaches. We first compute the lower bound via Dirichlet prior approach [18]. In Lemma 7, we show that

[TABLE]

We then improve the constant by reducing the problem of missing mass estimation to that of distribution estimation. In particular, in Theorem 11, we show that

[TABLE]

Combining the lower and upper bounds, we get

[TABLE]

Finding the exact minimax risk for the missing mass estimation problem remains an open question.

The rest of the paper is organized as follows. In Section II, we analyze the Good-Turing estimator. In Section III-A, we use Dirichlet prior approach to obtain lower bounds and in Section III-B we obtain lower bounds via reduction.

II Risk of Good-Turing Estimator

The analysis of [8] can be extended to characterize the risk of the Good-Turing estimator for missing mass. The squared error of the Good-Turing estimator $M^{\textrm{GT}}(X^{n})$ can be written down as follows:

[TABLE]

For $u,v\in\mathcal{X}$ , $E[\mathbb{I}(N_{u}=i)I(N_{v}=j)]=\mathbb{P}(N_{u}=i,N_{v}=j)$ . Using the notation $P_{n}(i,j)=\mathbb{P}(N_{u}(X^{n})=i,N_{v}(X^{n})=j)$ , we get

[TABLE]

The probability $P_{n}(i,j)$ can be written down as

[TABLE]

where $\binom{n}{i\ j}=\frac{n!}{i!j!(n-i-j)!}$ and $\binom{n}{i}=\frac{n!}{i!(n-i)!}$ . The summation in (4) is first split into two cases: $u\neq v$ and $u=v$ . Denoting $P(u,v)=p(u)p(v)(1-p(u)-p(v))^{n-2}$ , we have, for $u\neq v$ ,

[TABLE]

For $u=v$ , observe that $P_{n}(0,1)=0$ . Using the above observations, the summation in (4) simplifies to

[TABLE]

The following lemma is useful in bounding certain terms in the first summation above as a function of $n$ , independent of the unknowns $\mathcal{X}$ and $p$ .

Lemma 1.

For $i\geq 1$ , $j\geq 1$ ,

[TABLE]

Proof:

Let $X$ and $Y$ be a pair of independent and identical random variables with marginal distribution $p$ . Define a random variable $T(X,Y)$ , whose value $T(u,v)=0$ for $u=v$ and, for $u\neq v$ ,

[TABLE]

We see that $T(X,Y)$ is a probability for $X\neq Y$ , and that it takes values in $[0,1]$ in all cases. Therefore, its expectation

[TABLE]

which concludes the proof. ∎

A useful univariate version of Lemma 1 is the following.

Lemma 2.

For $i\geq 1$ ,

[TABLE]

Proof:

For $X\sim p$ , define $T(X)=\binom{n+i-1}{i-1}p(X)^{i-1}(1-p(X))^{n}$ and follow the proof of Lemma 1. ∎

Using Lemma 1, observe that

[TABLE]

Therefore, the risk can be written as

[TABLE]

The summation terms above can be rewritten as follows:

[TABLE]

where $(a)$ follows using Lemma 2.

[TABLE]

Using the above expressions in (8), we get the following characterization of the risk.

Theorem 3.

The risk of the Good-Turing estimator under squared error loss satisfies

[TABLE]

II-A Upper bound on risk

To obtain a tight upper bound on the risk, we start with the following upper bound on one of the terms in (8):

[TABLE]

where the first step follows because $1-x\leq e^{-x}$ for a fraction $x$ , and the second step follows because $te^{-t}\leq e^{-1}$ for $t\geq 0$ . Using (9), (10) and (13) in (8), an upper bound for the risk of the Good-Turing estimator is

[TABLE]

where the last step follows because $x(1-x)\leq 0.25$ for a fraction $x$ . The above constant $e^{-1}+0.25\approx 0.6179$ is not best possible, and could be marginally improved by more careful analysis. However, we show that the improvement is not significant through a lower bound on $R_{n}(M^{\textrm{GT}})=\max_{p}R_{n}(M^{\textrm{GT}},p)$ by picking $p$ to be a suitable uniform distribution.

II-B Lower bound on the Good-Turing worst-case risk

A lower bound can be obtained for the worst case risk of the Good-Turing estimator by evaluating the risk for the uniform distribution $p_{U}$ on $\mathcal{X}$ . Let $\left|\mathcal{X}\right|=cn$ and $p_{U}\left(x\right)=\frac{1}{cn}$ for all $x\in\mathcal{X}$ , where $c$ is a positive constant. Using (8), we get

[TABLE]

where the reasoning for the steps is as follows:

a)

replacing $\left(1-\frac{1}{cn}\right)^{n-1}$ with $\left(1-\frac{1}{cn}\right)^{n}\left(1+o(1)\right)$ . 2. b)

using the fact that $\left(1-\frac{1}{cn}\right)^{n}=e^{-1/c}\left(1+o(1)\right)$ .

The coefficient of $\frac{1}{n}$ in (15) can be maximized numerically to obtain a maximum value of $0.6080$ at $c\approx 1.1729$ . Hence, from (14) and (15), we have:

Theorem 4.

The worst-case risk of the Good-Turing estimator satisfies the following bounds:

[TABLE]

Therefore, the constant in (14) is fairly tight.

III Lower Bounds on the Minimax Risk

In this section, we consider lower bounds on the squared error risk of an arbitrary estimator of missing mass. The main result is that the minimax risk is lower-bounded by $c/n$ for a constant $c$ . Two methods are described for finding lower bounds - the first one is a Dirichlet prior approach, and the second one is reduction of the missing mass problem to a distribution estimation problem.

Both approaches provide the same order of $1/n$ for the lower bound, but the second reduction approach provides a better constant. However, the Dirichlet prior approach has significant potential for further optimization for better constants, and is an interesting extension of the standard prior method to the case of estimation of random variables such as missing mass, which depend on both the distribution $p$ and the sample $X^{n}$ .

III-A Lower Bounds via Prior Distributions

The first approach is to bound the minimax risk by the average risk obtained by averaging over a family of distributions with a prior. Let $P$ be a random variable over a family of distributions $\mathcal{P}$ , having an alphabet $\mathcal{X}=\left\{0,1,2,\ldots k-1\right\}$ . In the following section, the missing mass will be denoted as $M_{0}\left(X^{n},p\right)$ to explicitly show the dependence on the distribution $p$ .

Lemma 5.

For any missing mass estimator $\hat{M}_{0}(X^{n})$ and a random variable $P$ over a family of distributions $\mathcal{P}$ ,

[TABLE]

Proof:

[TABLE]

where (a) follows from the law of total expectation and (b) follows from the fact that (a) is minimized when $\hat{M}_{0}\left(X^{n}\right)=\mathbb{E}_{P|X^{n}}\left(\left.M_{0}\left(X^{n},P\right)\right|X^{n}\right)$ . ∎

Lemma 5 gives us a family of bounds depending on the distribution of the prior $P$ . The RHS in Lemma 5 can be computed exactly for a Dirichlet prior with some analysis.

Lemma 6.

Suppose $P$ has a Dirichlet distribution $\mbox{Dir}\left(k,\boldsymbol{\alpha}\right)$ , where $\boldsymbol{\alpha}=\left(\alpha_{0},\alpha_{1},\ldots,\alpha_{k-1}\right)$ . Then, we have

[TABLE]

where $B\left(\cdot,\cdot\right)$ is the Beta function and $a=\sum_{u\in\mathcal{X}}\alpha_{u}$ .

We skip the details for want of space.

Let $\boldsymbol{\alpha}=\left(\frac{1}{n},\frac{1}{n},\ldots,\frac{1}{n}\right)$ and $k=cn^{2}$ . For this choice of parameters, the expression in Lemma 6 can be bounded as

[TABLE]

where, once again, we skip the details. The coefficient of $\frac{1}{n}$ attains a maximum value of $\frac{4}{27}$ when $c=\frac{1}{2}$ , which results in the following bound on the minimax risk:

Lemma 7.

[TABLE]

The bound is worse than the $\frac{1}{4n}$ bound obtained from distribution estimation in the next section, but it can possibly be improved by better selection of the prior.

III-B Lower bounds via Distribution Estimation

To bound the minimax risk for missing mass estimation, one approach is to reduce the problem to that of estimating a distribution. Let $\mathcal{P}$ be the set of distributions over the set $\mathcal{X}=\left\{0,1\right\}$ such that for all $p\in\mathcal{P}$ , $p\left(0\right)\geq\frac{1}{2}$ . A known result (refer [19, 20] for instance) states that the minimax $\ell^{2}$ loss in estimating $p(0)$ is $\frac{1}{4n}$ . More precisely, let $\hat{p}(X^{n})$ be an estimator for $p(0)$ from a random sample $X^{n}$ distributed according to $p$ . Then, we have

Lemma 8.

[TABLE]

For an arbitrary positive integer $k$ , let $\mathcal{P}_{c}$ be the set of distributions over the set $\mathcal{X}=\left\{0,1,2,\ldots k-1\right\}$ , such that for any $p_{c}\in\mathcal{P}_{c}$ , we have $p_{c}\left(0\right)\geq\frac{1}{2}$ and $p_{c}\left(i\right)=\frac{1-p_{c}\left(0\right)}{k}$ for all $i\geq 1$ . We can use Lemma 8 to obtain minimax bounds in estimating $p_{c}\left(0\right)$ for this family of distributions as well. Let $\hat{p}_{c}(X^{n})$ be an estimator for $p_{c}$ from a random sample $X^{n}$ distributed according to $p_{c}$ . Let $\hat{p}_{c}(X^{n},i)$ be the probability $\hat{p}_{c}$ assigns to the symbol $i$ .

Lemma 9.

[TABLE]

Proof:

Suppose we want to estimate an unknown distribution $p\in P$ and we have an estimator $\hat{p}_{c}$ for distributions in $\mathcal{P}_{c}$ . Then we can use $\hat{p}_{c}$ to estimate $p$ as follows. Take the observed sample distributed according to $p$ , and if it is 0, keep it as it is. If it is 1, then replace it with an uniformly sampled random variable over $\left\{1,2,\ldots k\right\}$ . The result of this sampling process is a distribution $p_{c}$ in $\mathcal{P}_{c}$ with $p_{c}\left(0\right)=p\left(0\right)$ . Thus, any estimator for distributions in $\mathcal{P}_{c}$ can be reduced to an estimator for distributions in $\mathcal{P}$ and

[TABLE]

and the proof follows from Lemma 8. ∎

Lemma 10.

Let $k=e^{n}$ . With probability at least $1-1/2^{n}$ , the missing mass $M_{0}\left(X^{n}\right)$ satisfies

[TABLE]

Proof:

Probability of symbol [math] appearing at least once in $X^{n}$ is $1-(1-p(0))^{n}\geq 1-1/2^{n}$ . Furthermore, at most $n$ distinct symbols from $1,2,\ldots k-1$ can appear in $X^{n}$ . Hence, with probability $1-1/2^{n}$ , the observed mass $1-M_{0}\left(X^{n}\right)$ satisfies

[TABLE]

and hence follows the lemma. ∎

From Lemmas 9 and 10, we can obtain a lower bound of $1/4n$ on the minimax risk of missing mass estimation. Combining the lower bound with the upper bound on the risk of the Good-Turing estimator from Theorem 4, we have the following:

Theorem 11.

The minimax risk of missing mass estimation, denoted $R_{n}^{*}$ , satisfies the following bounds:

[TABLE]

IV Summary and Future Directions

We studied the problem of missing mass estimation and showed that the minimax risk lies between $0.617/n$ and $1/4n$ . We further showed that the risk of the Good-Turing estimator lies between $0.608/n$ and $0.617/n$ .

Our results pose several interesting questions for future work. Two natural questions are: (1) are there priors which yield better lower bounds on the minimax risk of missing mass? and (2) are there estimators that have better risk than the Good-Turing estimator?

We finally remark that it might be interesting to see if the minimax risk results imply better concentration results for the missing mass and the Good-Turing estimator.

V Acknowledgements

Authors thank Alon Orlitsky for helpful discussions. Ananda Theertha Suresh thanks Jayadev Acharya for helpful comments.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. A. Gale and G. Sampson, “Good-Turing frequency estimation without tears,” Journal of Quantitative Linguistics , vol. 2, no. 3, pp. 217–237, 1995.
2[2] S. F. Chen and J. Goodman, “An empirical study of smoothing techniques for language modeling,” in Proceedings of the 34th Annual Meeting on Association for Computational Linguistics , ser. ACL ’96, 1996, pp. 310–318.
3[3] A. Chao and S.-M. Lee, “Estimating the number of classes via sample coverage,” Journal of the American statistical Association , vol. 87, no. 417, pp. 210–217, 1992.
4[4] I. J. Good, “The population frequencies of species and the estimation of population parameters,” Biometrika , vol. 40, no. 3-4, pp. 237–264, 1953.
5[5] V. Q. Vu, B. Yu, and R. E. Kass, “Coverage-adjusted entropy estimation,” Statistics in medicine , vol. 26, no. 21, pp. 4039–4060, 2007.
6[6] T.-J. Shen, A. Chao, and C.-F. Lin, “Predicting the number of new species in further taxonomic sampling,” Ecology , vol. 84, no. 3, pp. 798–804, 2003.
7[7] R. K. Colwell, A. Chao, N. J. Gotelli, S.-Y. Lin, C. X. Mao, R. L. Chazdon, and J. T. Longino, “Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages,” Journal of Plant Ecology , vol. 5, no. 1, pp. 3–21, 2012.
8[8] D. A. Mc Allester and R. E. Schapire, “On the convergence rate of Good-Turing estimators,” in Proceedings of the Thirteenth Annual Conference on Computational Learning Theory , ser. COLT ’00. Morgan Kaufmann Publishers Inc., 2000, pp. 1–6.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Minimax Risk for Missing Mass Estimation

Abstract

I Introduction

I-A Good-Turing estimator and previous results

I-B New results

II Risk of Good-Turing Estimator

Lemma 1**.**

Proof:

Lemma 2**.**

Proof:

Theorem 3**.**

II-A Upper bound on risk

II-B Lower bound on the Good-Turing worst-case risk

Theorem 4**.**

III Lower Bounds on the Minimax Risk

III-A Lower Bounds via Prior Distributions

Lemma 5**.**

Proof:

Lemma 6**.**

Lemma 7**.**

III-B Lower bounds via Distribution Estimation

Lemma 8**.**

Lemma 9**.**

Proof:

Lemma 10**.**

Proof:

Theorem 11**.**

IV Summary and Future Directions

V Acknowledgements

Lemma 1.

Lemma 2.

Theorem 3.

Theorem 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Theorem 11.