On the occupancy problem for a regime switching model
Michael Grabchak, Mark Kelbert, and Quentin Paris

TL;DR
This paper investigates occupancy probabilities in a regime switching Markov chain model, providing finite sample bounds and asymptotic analysis, especially for regularly varying distributions, revealing rate optimal bounds.
Contribution
It introduces finite sample bounds and asymptotic results for occupancy probabilities in a regime switching Markov chain, extending beyond iid assumptions.
Findings
Finite sample bounds are rate optimal.
Asymptotic behavior characterized for regularly varying distributions.
Bounds decay at the same rate as asymptotics.
Abstract
This article studies the expected occupancy probabilities on an alphabet. Unlike the standard situation, where observations are assumed to be independent and identically distributed (iid), we assume that they follow a regime switching Markov chain. For this model, we 1) give finite sample bounds on the occupancy probabilities, and 2) provide detailed asymptotics in the case where the underlying distribution is regularly varying. We find that, in the regularly varying case, the finite sample bounds are rate optimal and have, up to a constant, the same rate of decay as the asymptotic result.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
ON THE OCCUPANCY PROBLEM FOR A REGIME SWITCHING MODEL
Michael Grabchaklabel=e1][email protected] [
Mark Kelbertlabel=e2][email protected] [
Quentin Paris label=e3][email protected] [ University of North Carolina Charlotte
Department of Mathematics and Statistics
Charlotte, NC, USA.
National Research University Higher School of Economics (HSE)
Faculty of Economics, Department of Statistics and Data Analysis
Moscow, Russia.
National Research University Higher School of Economics (HSE)
Faculty of Computer Science, School of Data Analysis and Artificial Intelligence & HDI LAB
Moscow, Russia.
Abstract
This article studies the expected occupancy probabilities on an alphabet. Unlike the standard situation, where observations are assumed to be independent and identically distributed (iid), we assume that they follow a regime switching Markov chain. For this model, we 1) give finite sample bounds on the occupancy probabilities, and 2) provide detailed asymptotics in the case where the underlying distribution is regularly varying. We find that, in the regularly varying case, the finite sample bounds are rate optimal and have, up to a constant, the same rate of decay as the asymptotic result.
occupancy problem,
regime switching,
Markov chain,
regular variation,
keywords:
\startlocaldefs\endlocaldefs
, , and
Contents
1 Introduction
Let be a finite or countably infinite set and let be a discrete time -valued stochastic process defined on some probability space . We refer to set as the alphabet and to elements of as letters. These letters may represent different things in the context of different applications. For instance, in linguistics they may represents words in some language, while in ecology they may represent species in an ecosystem. From a general point of view, the occupancy problem (or urn scheme) is to describe the repartition of the process over the set . In this context, two quantities of interest are
[TABLE]
These quantities are related by the fact that
[TABLE]
In words, is the number of times that the letter observed at time had previously been observed and is the probability that, given the observations up to time , the letter observed at time will have already been seen times. We refer to the quantities as the occupancy probabilities. The quantity is also sometimes called the missing mass. It corresponds to the probability of seeing a new letter at time . In certain ecological contexts, it represents the probability of discovering a new species. While properties of and have been thoroughly studied in the context, where are independent and identically distributed (iid) random variables, we have seen no work in the literature relating to the case, where they follow a more general stochastic process. In this paper, we give such results for a class of Markov chains, which form a regime switching model. This model expands the scope of potential applications. Moreover, it is our hope that this paper will stimulate interest in studying this problem in the context of other, more general, processes.
1.1 Related Work
In the iid setting, the literature on the behavior of , , and related quantities is vast, see, for instance, the classic textbook Johnson and Kotz (1977), the survey Gnedin et al. (2007), or recent contributions by Ben-Hamou et al. (2017) and Decrouez et al. (2018). Applications include fields such as Ecology (Good, 1953; Good and Toulmin, 1956; Chao, 1981; Gandolfi and Sastri, 2004), Genomics (Mao and Lindsay, 2002), Language Processing (Chen and Goodman, 1999), Authorship Attribution (Efron and Thisted, 1976; Thisted and Efron, 1987; Zhang and Huang, 2007), Information Theory (Orlitsky et al., 2004; Ben-Hamou et al., 2016), Computer Science (Zhang, 2005), and Machine Learning (Bubeck et al., 2013; Grabchak and Zhang, 2017).
We now briefly sketch several key results for the case where the random variables are iid with common distribution on . In this case, it is readily shown that
[TABLE]
This expression allows for a precise asymptotic analysis. Following Karlin (1967), it is understood that the main ingredients for this analysis are given by the counting measure and the counting function . These are defined, respectively, by
[TABLE]
and
[TABLE]
Next, recall that a function is said to be slowly varying at if for any
[TABLE]
In this case, we write . With this notation, if for some and some , then for ,
[TABLE]
This result is discussed, in greater detail, in the Appendix below. Non-asymptotic results are given in Decrouez et al. (2018). The main result of that paper is as follows.
Lemma 1.1** (Theorem 2.1 in Decrouez et al., 2018).**
Let be a probability measure on with counting function . For all , all , and all ,
[TABLE]
where
[TABLE]
1.2 Regime Switching Model
A natural extension of the iid case is to a class of regime switching Markov chains or regime switching models. In this context the elements in no longer represent letters, but entire alphabets. Each represents an alphabet, which we denote by , where . This alphabet has its own distribution , and we assume that observations from each alphabet are iid with distribution . However, we randomly perform transitions between alphabets following a Markov chain with transition operator . Formally, we consider a Markov chain on the product space with transition operator defined by
[TABLE]
In the interest of generality, we sometimes consider the case where transitions between alphabets do not follow a Markov chain, but a more general process. Nevertheless, our motivation comes from the case where the transitions are Markovian. Such situations can be used to describe a variety of situations, such as:
(Classics) A researcher reads documents in an antique library. The documents are written in a variety of languages (e.g. Latin, Greek, Hebrew, etc.). Assume that transitions between documents written in different languages follow a Markov chain. Here the regime switching Markov chain represents the sequence of ordered pairs comprised of the word that the researcher is currently reading and the language that the current document is written in. In this context, the missing mass represents the probability that the next word that the researcher encounters will be one that this researcher has not previously seen and will thus need to look up.
- 2.
(Ecology) An ecologist is observing the animals that are found in a certain plot of forest. However, the forest has several states (e.g. time of day, weather, etc.) with transitions between these following a Markov chain. To understand the difference in the distribution of species found under different states, the ecologist keeps track of both the species of the observed animal and the state of the forest.
- 3.
(Computer Science) A server periodically enters a state where there is a serious hacking attempt. Assume that transitions into and out of this state follow a Markov chain. To understand the effect of a serious hacking attempt on the number of packets that arrive, a researcher keeps track the number of packets that arrive in increments of, say, five minutes along with the state of the server in that time period.
- 4.
(Economics) An economy can be in one of several states, e.g. growth, recession, inflation, etc. One can model transitions between these states using a Markov chain. To understand the effect of the state of the economy on some economic indicator (e.g. the number of bank failures in a week) an economist keeps track of both the indicator and the state of the economy.
1.3 Organization
The main goal of this paper is to extend the results given in Equation (1.4) and Lemma 1.1 from the iid case to the regime switching model. We begin by giving results for a simple class of Markov chains, which will drive this model. Toward this end, we introduce a useful technical result in Section 2, and then, in Section 3, we consider the case of an ergodic Markov chain on a finite state space. In Section 4, we formally define the regime switching model and give extensions of Lemma 1.1. In the interest of generality, most results in this section do not assume that transitions between alphabets are Markovian. However, this assumption is needed for the more detailed results. Then, in Section 5, we extend (1.4) to the case of the regime switching model. Proofs are postposed to Section 6. A brief review of basic properties of regularly varying distributions on an alphabet is given in the Appendix.
1.4 Notation
Before proceeding we set up some notation. We write to denote the indicator function of event . For a set , we write to denote the cardinality of . For real numbers , we write or to denote the maximum of and and we write or to denote the minimum of and . For two sequences and we write to mean as . We write for to denote the gamma function.
2 Preliminaries
In this section we introduce a technical result, which will be useful in the sequel. Toward this end, fix a finite or countably infinite set , a Markov transition operator on , and a probability measure on . Let be an -valued random process defined on some probability space such that is a -Markov chain with initial distribution . We write to denote the expectation under . We write to denote the -step transition operator of the Markov chain. For all integers and , we set
[TABLE]
to be the local time of Markov chain in state , and we set
[TABLE]
to denote the number of times that the state visited at time had been visited up to time .
We now give a result, which connects the distribution of with that of the local times of the reversed chain. We assume that the -Markov chain is irreducible, aperiodic, positive recurrent, and has stationary distribution . We denote by the associated reversed chain, i.e. an -valued Markov chain with transition operator defined by
[TABLE]
It is easy to check that is also the stationary distribution of and that the -step transition operator of the reversed chain is given by
[TABLE]
We say that the chain is reversible when . In this case and the chains and have the same distribution given an initial distribution. We write to denote the local time of the reversed chain at , i.e.
[TABLE]
Lemma 2.1**.**
Let be a finite or countably infinite set. Suppose is an irreducible, aperiodic, and positive recurrent Markov chain on with stationary distribution and reversed chain . Let and be arbitrary distributions on . Then, for any positive measurable function and all integers ,
[TABLE]
where, on the right-hand side, it is understood that is taken as the initial distribution of the reversed chain, i.e. it is the distribution of .
Remark 2.1**.**
Note, in particular, that taking in the above formula, and supposing the chain to be reversible, we get that for any positive measurable ,
[TABLE]
so that, under , has the same distribution as .
3 Finite Markov Chains
In this section we provide a bound on in the context of an ergodic Markov chain on a finite state space. This result is interesting in itself, and it will be important in the sequel because such models will drive our regime switching model. Let be an irreducible and aperiodic Markov chain with finite state space , transition matrix , and stationary distribution . This implies that there exists an integer such that
[TABLE]
From (2.3), it follows that for all . Let
[TABLE]
where is the cardinality of . Note that and that, for each ,
[TABLE]
where is the uniform distribution on . By Theorem 8 in Roberts and Rosenthal (2004), this implies that for every
[TABLE]
This results continues to hold if is replaced by . In this context, Theorem 2 of Glynn and Ormoneit (2002) gives the following concentration inequality for .
Lemma 3.1**.**
If and are such that (3.3) holds, then for any , any , and any initial distribution we have
[TABLE]
for .
Clearly, the above holds for both the chain and the reversed chain . Similar concentration inequalities can be obtained by applying Corollary 2.10 and Remark 2.11 in Paulin (2015). Combining this with Lemma 2.1 gives the following.
Proposition 3.1**.**
For and any initial distribution , we have
[TABLE]
where and are as above, , and .
In particular, note that, when the constant . It is straightforward to check that the asymptotic behavior of the upper bound is given by
[TABLE]
where .
Remark 3.1**.**
It may be interesting to note that Proposition 3.1 gives a bound with exponential decay. This holds, in particular, for the special case, where are iid random variables. In comparison, Corollary 2.1 of Decrouez et al. (2018) focuses on the iid case and only gives the bound
[TABLE]
where is given by (1.5).
The proof of Proposition 3.1 depends heavily on the assumption of a finite alphabet. While concentration inequalities for the local times of Markov chains in the case of infinite alphabets are well-known and can be found in e.g. Glynn and Ormoneit (2002) and Paulin (2015), there does not appear to be a simple way to transform these into bounds on . The issue comes from the fact that we need , but it is always zero when is an infinite set. An interesting situation, where we are able to deal with infinite alphabets, is the regime switching model. This is the focus of the remainder of this paper.
4 Regime Switching Model
This section formally introduces the regime switching model and extends the finite sample bounds given in Lemma 1.1 to this case. While we are primarily interested in the case, where transitions between alphabets follow an ergodic Markov chain on a finite state space, our presentation is given in more generality. Let be a finite or countably infinite set. For each , let be a probability distribution on . Any discrete time stochastic process on can be described by a family of conditional distributions , where and for
[TABLE]
We now introduce a process on the state space defined by the family of conditional distributions given by , where satisfies and for
[TABLE]
Now, let be an -valued stochastic process governed by and let and denote the first and second coordinate processes of , i.e.
[TABLE]
We will refer to the process as the underlying process. Note, in particular, that is -valued, while takes values in . The next result gives a more explicit description of the dynamics of the processes and .
Lemma 4.1**.**
In the above context, the following statements hold:
The process is governed by .
For all and for all ,
[TABLE]
where is the random variable equal to on the event .
Conditionally on the variables , the variables are independent. In particular, for all and all ,
[TABLE]
where is the random variable equal to on the event .
Remark 4.1**.**
We are motivated by the case, where represents the conditional distributions of a Markov chain with transition operator and initial distribution . In this case, we have: and, for ,
[TABLE]
It follows that, in this case, and, for ,
[TABLE]
which is the Markov operator denoted by in (1.6). In this case, to emphasize the dependence on the initial distibution we will write for and for . It should be noted that the subscript refers to the initial distribution of the underlying process and not of .
Our next results establishes a link between the quantities:
[TABLE]
Lemma 4.2**.**
For all and all
[TABLE]
where we take when .
A slight modification of Lemma 4.2 brings us to the main result of this section, which extends Lemma 1.1 from the iid case, to the regime switching case. First, we introduce some notation. For all , we write to denote the counting function of , which is defined, for all , by
[TABLE]
Theorem 4.1**.**
For any and any , we have
[TABLE]
where
[TABLE]
and where is as in (1.5).
Since, the formulation of Theorem 4.1 is quite general, an explicit evaluation of the coefficients and can require cumbersome computations. More tractable formulas can be provided in a number of situations. We give several examples.
Example 4.1**.**
Consider the situation where all distributions are equal to the same distribution and therefore all counting functions equal to the counting function of . In this scenario, an elementary reordering of the terms in (4.3) yields that, for any ,
[TABLE]
where
[TABLE]
and, for ,
[TABLE]
where is as in (1.5).
Example 4.2**.**
Another favorable scenario corresponds to the case where all probabilities have support contained in for some independent of , i.e.
[TABLE]
In this case, taking on the right-hand side of (4.3), and noticing that corresponds to the size of the support of , yields
[TABLE]
where
[TABLE]
*and where is as in (1.5).
We now turn to the important situation where the distribution is regularly varying. In the iid case, the corresponding result is given in Corollary 2.2 of Decrouez et al. (2018).
Proposition 4.1**.**
Assume that for some and some non-increasing function , we have
[TABLE]
for all and all . In this case,
[TABLE]
where
[TABLE]
, and is the incomplete gamma function.
Remark 4.2**.**
Note that, in the case, and , the bound in Proposition 4.1 is trivial since it involves . Even in the iid case, the bounds given in Decrouez et al. (2018) are not able to deal with this case.
Remark 4.3**.**
Note that Theorem 4.1 and Proposition 4.1 are quite general and hold no matter what the underlying process is. However, this generality has a cost. In particular, we still need to know quite a bit about the underlying process. In the case where the underlying process is a finite state space ergodic Markov chain, we can use Proposition 3.1 and related results to get more explicit formulas.
Corollary 4.1**.**
Assume that and that the underlying process is an aperiodic and irreducible Markov chain with transition operator , stationary distribution , and initial distribution . Let , let be as in (3.1), and let be as in (3.2). Assume further that, for some and some non-increasing function , we have
[TABLE]
For any , if , then
[TABLE]
where
[TABLE]
Here is as in Proposition 3.1, and are as in Proposition 4.1, and
[TABLE]
It may be interesting to note that, for any we have
[TABLE]
5 Asymptotics For the Regime Switching Model
In this section we extend (1.4) from the iid case to the case of the regime switching model, where the underlying process is an ergodic Markov chain on a finite state space. We first define regular variation of . For a review of basic facts about regularly varying distributions on we refer the reader to Appendix A.
Definition 5.1**.**
We say that is regularly varying with index if there exists an and a function , which is not identically zero, such that for each
[TABLE]
where is defined as in (4.2). In this case we write .
When , we additionally assume that there exists an and a function , which is not identically zero, such that for each
[TABLE]
For simplicity of notation, set for
[TABLE]
Propositions A.1 and A.2 imply that if , then
[TABLE]
where
[TABLE]
Note that, since , the convergence in (5.2) is uniform in . We now give the main result for this section.
Theorem 5.1**.**
In the context of the regime switching model, assume that and that the underlying process is an aperiodic and irreducible Markov chain with stationary distribution and initial distribution . Assume further that with (when additionally assume that (5.1) holds) and that (or when ) is locally bounded away from [math] and on . In this case for all we have
[TABLE]
and
[TABLE]
where is given by (5.6).
This implies that, up to a constant, we have the same asymptotics as for the upper bound in Corollary 4.1. It may be interesting to note that as part of the proof of the theorem, we show that for any
[TABLE]
6 Proofs
6.1 Proofs for Sections 2 and 3
Proof of Lemma 2.1.
Let us first prove that, for any distributions and on and any bounded function ,
[TABLE]
where, on the right-hand side, it is understood that is taken as the initial distribution of the reversed chain. From the definition of , we obtain
[TABLE]
which proves (6.1). Then, for any measurable positive ,
[TABLE]
where the second line follows by applying identity (6.1) with
[TABLE]
This completes the proof. ∎
Proof of Proposition 3.1.
Fix and observe that the assumption on implies that . As a result, since , we deduce from Lemma 3.1 that
[TABLE]
when , which is equivalent to . From here, we provide two bounds on , which, when combined, give the desired result. First, note that
[TABLE]
Next, using Lemma 2.1 with , it follows that
[TABLE]
where is the probability measure that corresponds to the case where the initial distribution is a point-mass at . Hence, using once again Lemma 3.1 and the fact that the stationary distribution of the reversed chain is the same as for the original chain, it follows that
[TABLE]
provided or equivalently . The desired result follows by combining (6.2) and (6.3). ∎
6.2 Proofs for Section 4
For convenience, we sometimes denote for a given process .
Proof of Lemma 4.1.
The statement follows easily from the structure of . Let and be the functions defined, for , by and . We have,
[TABLE]
Further, for any and any bounded (and measurable) ,
[TABLE]
From here, the fact that
[TABLE]
implies
[TABLE]
In particular, taking gives
[TABLE]
which proves the claim.
For all all and ,
[TABLE]
Using point it follows that
[TABLE]
and that
[TABLE]
Combining these two identities with (6.4), we deduce that
[TABLE]
where the last identity follows from the fact that . The case where is similar.
For any and any ,
[TABLE]
were the first identity follows by arguments similar to those used in the proof of point and the second follows directly from point . Finally, the proof that, for
[TABLE]
is very similar and is omitted for brevity. ∎
Proof of Lemma 4.2.
Fix and . Since , we have
[TABLE]
Noticing that the variable is -measurable by construction, we obtain
[TABLE]
Conditionally on and the variables are, according to point of Lemma 4.1, independent and satisfy
[TABLE]
As a result, conditionally on and , the variable
[TABLE]
follows a Binomial distribution with parameters and . Hence, we obtain
[TABLE]
where the last line follows from point of Lemma 4.1. ∎
Proof of Theorem 4.1.
From Lemma 4.2 it follows that
[TABLE]
Note that
[TABLE]
Now, using Lemma 1.1 inside the expectation yields
[TABLE]
where we have denoted
[TABLE]
Finally, observing the fact that
[TABLE]
gives the result. ∎
Proof of Proposition 4.1.
By Lemma 4.2, we have
[TABLE]
Corollary 2.2 from Decrouez et al. (2018) implies that
[TABLE]
From here, the results follows in the case where from the fact that . Now, assume that . Taking in (2.4) of Decrouez et al. (2018) implies
[TABLE]
On the other hand, since , we also have
[TABLE]
This completes the proof. ∎
Proof of Corollary 4.1.
Fix , let
[TABLE]
and note that . We can write
[TABLE]
Now note that
[TABLE]
and
[TABLE]
Combining this with Proposition 4.1 gives
[TABLE]
From here, the result follows by applying Proposition 3.1. ∎
6.3 Proofs for Section 5
To prove Theorem 5.1, we begin with two technical results.
Lemma 6.1**.**
*Let be an irreducible and aperiodic Markov chain on a finite state space and with stationary distribution . Let and let .
- For any , any , any , and any initial distribution we have*
[TABLE]
and
[TABLE]
2. If and , then, with probability ,
[TABLE]
and for any and any initial distribution
[TABLE]
Proof.
The first part follows immediately from the exponential bound in Proposition 3.1. We now turn to the second part. For ease of notation, set . Since the Markov chain is irreducible and aperiodic on a finite state space, it is recurrent and hence with probability . Further, it satisfies the strong law of large numbers, which mean that for each , if , then with probability . Since is a finite set, with probability , this convergence can be taken to be uniform in . Let with , such that for any we have and for any there exists an such that if then
[TABLE]
Now fix and . There exists an such that if then and
[TABLE]
Further, by the uniform convergence theorem for regularly varying functions, see e.g. Proposition 2.4 in Resnick (2007), there is a such that, for any and any
[TABLE]
Since , it follows that, for ,
[TABLE]
which proves (6.8). We now turn to the last part. Fix and let
[TABLE]
Note that . We can write
[TABLE]
Fix , by the Potter bounds (see e.g. Theorem 1.5.6 in Bingham et al. (1987)), there exists a such that
[TABLE]
where the convergence follows by the first part of this lemma. Similarly,
[TABLE]
Combining this with the fact that is bounded means that we can use dominated convergence to get
[TABLE]
where the third equality follows from (6.8) and the fact that, with probability , there exists a (random) such that for all , and the fourth equality follows by the fact that the distribution of converges weakly to , Skorokhod’s representation theorem, and dominated convergence. ∎
Lemma 6.2**.**
Let and let . When assume, in addition, that (5.1) holds.
1. Let be any sequence of -valued random variables and let is a sequence of -valued random variables such that, with probability , as . With probability ,
[TABLE]
2. Let be an irreducible and aperiodic Markov chain with state space and stationary distribution . If , then, with probability ,
[TABLE]
Note that, in the first part, the sequences and may be dependent or independent.
Proof.
We begin with the first part. Let be a set with such that, for any , . Fix and . Since (5.2) holds uniformly in , it follows that there is an such that for all and all we have
[TABLE]
Now let be a number such that, if then . For all such , the above holds with in place of . From here, the first part follows. For the second part, we have
[TABLE]
Since the Markov chain is irreducible on a finite state space, all of its states are recurrent and hence with probability . Thus, by the first part of this lemma, the fact that , and the fact that , it suffices to show that, with probability ,
[TABLE]
which holds by Lemma 6.1. ∎
Proof of Theorem 5.1.
Note that, by Lemma 4.2
[TABLE]
We begin with . Since , it follows that
[TABLE]
where the convergence follows by Lemma 6.1. We next turn to . Note that (5.2), the fact that , and the fact that is locally bounded implies that there is a constant depending only on with
[TABLE]
for any and some . Here the second inequality follows by the Potter bounds, see e.g. Theorem 1.5.6 in Bingham et al. (1987). For simplicity, set . Fix and let
[TABLE]
Note that . We can write
[TABLE]
By Lemma 6.1, we have
[TABLE]
Similarly,
[TABLE]
Combining this with the fact that is bounded for fixed means that we can use dominated convergence to get
[TABLE]
where the second equality follows from Lemma 6.2 and the fact that, with probability , there exists a (random) such that for all . The third equality, follows from the fact that the distribution of converges weakly to , Skorokhod’s representation theorem, and dominated convergence. This gives the first part of Theorem 5.1. The second part follows from the first and Lemma 6.1. ∎
Appendix A Regular variation
In this appendix, we briefly review several basic facts about regularly varying distributions on . First, we recall that for a probability measure on , the counting measure is defined by (1.1) and the counting function is defined by (1.2).
Definition A.1**.**
A probability distribution with counting function is said to be regularly varying, with exponent , if
[TABLE]
for function . In this case, we write .
To motivate this definition, we recall the following fact from Gnedin et al. (2007). For , we have if and only if
[TABLE]
for some , which is, in general, different from . When , a sufficient condition for is that there exists an with
[TABLE]
In this case, we necessarily have
[TABLE]
and as , see Proposition 15 in Gnedin et al. (2007). We will generally assume that (A.1) holds in this case.
Proposition A.1**.**
Let . If , then for all ,
[TABLE]
If and (A.1) holds, then for every
[TABLE]
If , then for every the result in (A.2) holds. If and then
[TABLE]
where for is a function with and as .
Proof.
For this is Proposition 7 in Ohannessian and Dahleh (2012). For the result follows by combining Proposition 18 in Gnedin et al. (2007) with Lemma 2 in Grabchak and Zhang (2017). Similarly, for the result follows by combining Proposition 19 in Gnedin et al. (2007) with Lemma 2 in Grabchak and Zhang (2017). The facts about are given in Proposition 14 of Gnedin et al. (2007). ∎
Proposition A.2**.**
Fix and . When assume that
[TABLE]
and when assume that
[TABLE]
If , then for all ,
[TABLE]
If then for all the result in (A.4) holds. If and then
[TABLE]
where is derived from as in Proposition A.1.
Proof.
Let , let , let , and note that
[TABLE]
A standard application of Fubini’s Theorem gives
[TABLE]
Fix . For , the assumptions imply that for small enough we have . It follows that for or and we have
[TABLE]
where the last equality follows by Karamata’s Theorem (Proposition 1.5.10 in Bingham et al. (1987)). Hence,
[TABLE]
Similarly, when and we have
[TABLE]
and hence
[TABLE]
When we have
[TABLE]
and hence
[TABLE]
From here a version of Karamata’s Tauberian Theorem (Theorem 1.7.1’ in Bingham et al. (1987)) implies that for or and
[TABLE]
and the corresponding result hold for the case and . From here, since , and , we can use Lemma 2 in Grabchak and Zhang (2017) to complete the result. ∎
Acknowledgements
The work of M. Kelbert and Q. Paris has been funded by the Russian Academic Excellence Project 5-100. The work of M. Grabchak is supported, in part, by the Russian Science Foundation (Project № 17-11-01098).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ben-Hamou et al. (2016) A. Ben-Hamou, S. Boucheron, and E. Gassiat. Pattern coding meets censoring: (almost) adaptive coding on countable alphabets. ar Xiv:1608.08367 , 2016.
- 2Ben-Hamou et al. (2017) A. Ben-Hamou, S. Boucheron, and M. I. Ohannessian. Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications. Bernoulli , 23(1):249–287, 2017.
- 3Bingham et al. (1987) N. H. Bingham, C. M. Goldie, and J. L. Teugels. Regular Variation . Encyclopedia of Mathematics And Its Applications. Cambridge University Press, Cambridge, 1987.
- 4Bubeck et al. (2013) S. Bubeck, D. Ernst, and A. Garivier. Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality. Journal of Machine Learning Research , 14:601–623, 2013.
- 5Chao (1981) A. Chao. On estimating the probability of discovering a new species. The Annals of Statistics , 9:1339–1342, 1981.
- 6Chen and Goodman (1999) S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Computer Speech and Language , 13:359–394, 1999.
- 7Decrouez et al. (2018) G. Decrouez, M. Grabchak, and Q. Paris. Finite sample properties of the mean occupancy counts and probabilities. Bernoulli , 24(3):1910–1941, 2018.
- 8Efron and Thisted (1976) B. Efron and R. Thisted. Estimating the number of unseen species: How many words did Shakespeare know? Biometrika , 63:435–447, 1976.
