Three Asymptotic Regimes for Ranking and Selection with General Sample   Distributions

Jing Dong; Yi Zhu

arXiv:1705.05999·math.PR·May 18, 2017·WSC

Three Asymptotic Regimes for Ranking and Selection with General Sample Distributions

Jing Dong, Yi Zhu

PDF

Open Access

TL;DR

This paper investigates three asymptotic regimes for ranking and selection problems with general distributions, establishing their validity, efficiency, and interconnections, and comparing algorithm performances in these regimes.

Contribution

It introduces and analyzes three novel asymptotic regimes for R&S with general distributions, providing theoretical validation and performance comparisons.

Findings

01

Asymptotic regimes are valid and efficient for R&S.

02

Connections among different regimes are characterized.

03

Pre-limit algorithm performances are compared.

Abstract

In this paper, we study three asymptotic regimes that can be applied to ranking and selection (R&S) problems with general sample distributions. These asymptotic regimes are constructed by sending particular problem parameters (probability of incorrect selection, smallest difference in system performance that we deem worth detecting) to zero. We establish asymptotic validity and efficiency of the corresponding R&S procedures in each regime. We also analyze the connection among different regimes and compare the pre-limit performances of corresponding algorithms.

Figures1

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1: Simulation experiments for Exponential samples ( μ 1 = 1 subscript 𝜇 1 1 \mu_{1}=1 , μ 2 δ = 0.9091 superscript subscript 𝜇 2 𝛿 0.9091 \mu_{2}^{\delta}=0.9091 , α = 0.05 𝛼 0.05 \alpha=0.05 ).

Regime	$n$	Probability of Incorrect Selection
CLT Regime	$598$	$0.0497 \pm 0.0002$
LD Regime	$1320$	$0.0072 \pm 0.0001$
MD Regime	$1325$	$0.0071 \pm 0.0001$

Table 2. Table 2: Simulation experiments for constant and Bernoulli samples ( μ 1 = 0.008 subscript 𝜇 1 0.008 \mu_{1}=0.008 , μ 2 δ = 0.001 superscript subscript 𝜇 2 𝛿 0.001 \mu_{2}^{\delta}=0.001 , α = 0.01 𝛼 0.01 \alpha=0.01 ).

Regime	$n$	Probability of Incorrect Selection
CLT Regime	$111$	$0.1057 \pm 0.0003$
LD Regime	$477$	$0.0015 \pm 0.00003$
MD Regime	$188$	$0.0156 \pm 0.0001$

Equations130

\overset{ˉ}{X}_{1} (n) := \frac{1}{n} k = 1 \sum n X_{1, k} \mbox an d \overset{ˉ}{X}_{2}^{δ} (n) := \frac{1}{n} k = 1 \sum n X_{2, k}^{δ}

\overset{ˉ}{X}_{1} (n) := \frac{1}{n} k = 1 \sum n X_{1, k} \mbox an d \overset{ˉ}{X}_{2}^{δ} (n) := \frac{1}{n} k = 1 \sum n X_{2, k}^{δ}

S_{1}^{2} (n) := \frac{1}{n - 1} k = 1 \sum n (X_{1, k} - \overset{ˉ}{X}_{1} (n))^{2} \mbox an d S_{2}^{δ, 2} (n) := \frac{1}{n - 1} k = 1 \sum n (X_{2, k}^{δ} - \overset{ˉ}{X}_{2}^{δ} (n))^{2} .

S_{1}^{2} (n) := \frac{1}{n - 1} k = 1 \sum n (X_{1, k} - \overset{ˉ}{X}_{1} (n))^{2} \mbox an d S_{2}^{δ, 2} (n) := \frac{1}{n - 1} k = 1 \sum n (X_{2, k}^{δ} - \overset{ˉ}{X}_{2}^{δ} (n))^{2} .

P I S = P (\overset{ˉ}{X}_{1} (n_{1}) < \overset{ˉ}{X}_{2}^{δ} (n_{2})) .

P I S = P (\overset{ˉ}{X}_{1} (n_{1}) < \overset{ˉ}{X}_{2}^{δ} (n_{2})) .

n_{i}(\delta)=\frac{z_{\alpha}^{2}\sigma_{i}^{2}}{\delta^{2}}q_{i}~{}~{}~{}\mbox{ for $i=1,2$.}

n_{i}(\delta)=\frac{z_{\alpha}^{2}\sigma_{i}^{2}}{\delta^{2}}q_{i}~{}~{}~{}\mbox{ for $i=1,2$.}

δ \to 0 lim P (\overset{ˉ}{X}_{1} (n_{1} (δ)) < \overset{ˉ}{X}_{2}^{δ} (n_{2} (δ))) = α .

δ \to 0 lim P (\overset{ˉ}{X}_{1} (n_{1} (δ)) < \overset{ˉ}{X}_{2}^{δ} (n_{2} (δ))) = α .

P (\overset{ˉ}{X}_{1} (n_{1} (δ)) < \overset{ˉ}{X}_{2}^{δ} (n_{2} (δ)))

P (\overset{ˉ}{X}_{1} (n_{1} (δ)) < \overset{ˉ}{X}_{2}^{δ} (n_{2} (δ)))

n_{i}^{*}(\delta)=\frac{z_{\alpha}^{2}(\sigma_{1}+\sigma_{2})}{\delta^{2}}\sigma_{i}~{}~{}~{}\mbox{ for $i=1,2$.}

n_{i}^{*}(\delta)=\frac{z_{\alpha}^{2}(\sigma_{1}+\sigma_{2})}{\delta^{2}}\sigma_{i}~{}~{}~{}\mbox{ for $i=1,2$.}

n_{1}^{e} (δ) = n_{2}^{e} (δ) = z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) / δ^{2} .

n_{1}^{e} (δ) = n_{2}^{e} (δ) = z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) / δ^{2} .

n_{i}^{in}(\delta)=\frac{2z_{\alpha}^{2}\sigma_{i}^{2}}{\delta^{2}}~{}~{}~{}\mbox{ for $i=1,2$.}

n_{i}^{in}(\delta)=\frac{2z_{\alpha}^{2}\sigma_{i}^{2}}{\delta^{2}}~{}~{}~{}\mbox{ for $i=1,2$.}

κ (δ) := in f {n \geq δ^{- 1} : z_{α}^{2} \frac{S _{1}^{2} ( n ) + S _{2}^{δ, 2} ( n )}{n} < δ^{2}},

κ (δ) := in f {n \geq δ^{- 1} : z_{α}^{2} \frac{S _{1}^{2} ( n ) + S _{2}^{δ, 2} ( n )}{n} < δ^{2}},

δ \to 0 lim P (\overset{ˉ}{X}_{1} (κ (δ)) < \overset{ˉ}{X}_{2}^{δ} (κ (δ))) = α .

δ \to 0 lim P (\overset{ˉ}{X}_{1} (κ (δ)) < \overset{ˉ}{X}_{2}^{δ} (κ (δ))) = α .

δ^{2} κ (δ)

δ^{2} κ (δ)

z_{\alpha}^{2}\left(S_{1}^{2}((t+\delta)/\delta^{2})+S_{2}^{0,2}((t+\delta)/\delta^{2})\right)-(t+\delta)\Rightarrow z_{\alpha}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})-t\mbox{ in $D[0,\infty)$ as $\delta\rightarrow 0$}.

z_{\alpha}^{2}\left(S_{1}^{2}((t+\delta)/\delta^{2})+S_{2}^{0,2}((t+\delta)/\delta^{2})\right)-(t+\delta)\Rightarrow z_{\alpha}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})-t\mbox{ in $D[0,\infty)$ as $\delta\rightarrow 0$}.

δ^{2} κ (δ) \Rightarrow in f {t \geq 0 : z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) - t < 0} = z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) .

δ^{2} κ (δ) \Rightarrow in f {t \geq 0 : z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) - t < 0} = z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) .

P (\overset{ˉ}{X}_{1} (κ (δ)) < \overset{ˉ}{X}_{2}^{δ} (κ (δ)))

P (\overset{ˉ}{X}_{1} (κ (δ)) < \overset{ˉ}{X}_{2}^{δ} (κ (δ)))

(δ κ (δ) t) (\overset{ˉ}{X}_{1} (κ (δ) t) - \overset{ˉ}{X}_{2}^{0} (κ (δ) t)) + δ^{2} κ (δ) t \Rightarrow σ_{1}^{2} + σ_{2}^{2} B (z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) t) + z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) t

(δ κ (δ) t) (\overset{ˉ}{X}_{1} (κ (δ) t) - \overset{ˉ}{X}_{2}^{0} (κ (δ) t)) + δ^{2} κ (δ) t \Rightarrow σ_{1}^{2} + σ_{2}^{2} B (z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) t) + z_{α}^{2} (σ_{1}^{2} + σ_{2}^{2}) t

P (δ κ (δ) (\overset{ˉ}{X}_{1} (κ (δ)) - \overset{ˉ}{X}_{2}^{0} (κ (δ))) + δ^{2} κ (δ) < 0)

P (δ κ (δ) (\overset{ˉ}{X}_{1} (κ (δ)) - \overset{ˉ}{X}_{2}^{0} (κ (δ))) + δ^{2} κ (δ) < 0)

\kappa_{i}^{in}(\delta):=\inf\left\{n_{i}\geq\lfloor\delta^{-1}\rfloor:2z_{\alpha}^{2}\frac{S_{i}^{2}(n_{i})}{n_{i}}<\delta^{2}\right\},~{}~{}~{}\mbox{ for $i=1,2$.}

\kappa_{i}^{in}(\delta):=\inf\left\{n_{i}\geq\lfloor\delta^{-1}\rfloor:2z_{\alpha}^{2}\frac{S_{i}^{2}(n_{i})}{n_{i}}<\delta^{2}\right\},~{}~{}~{}\mbox{ for $i=1,2$.}

\tilde{n}_{i}(\alpha)=\frac{\log(1/\alpha)}{G(p_{1},p_{2})}p_{i}~{}~{}~{}\mbox{ for $i=1,2$. }

\tilde{n}_{i}(\alpha)=\frac{\log(1/\alpha)}{G(p_{1},p_{2})}p_{i}~{}~{}~{}\mbox{ for $i=1,2$. }

α \to 0 lim \frac{lo g P ( X ˉ _{1} ( n ~ _{1} ( α )) < X ˉ _{2}^{δ} ( n ~ _{2} ( α ))}{lo g ( α )} = 1.

α \to 0 lim \frac{lo g P ( X ˉ _{1} ( n ~ _{1} ( α )) < X ˉ _{2}^{δ} ( n ~ _{2} ( α ))}{lo g ( α )} = 1.

I (x_{1}, x_{2}) = \frac{p _{1}}{G ( p _{1} , p _{2} )} I_{1} (x_{1}) + \frac{p _{2}}{G ( p _{1} , p _{2} )} I_{2} (x_{2}) .

I (x_{1}, x_{2}) = \frac{p _{1}}{G ( p _{1} , p _{2} )} I_{1} (x_{1}) + \frac{p _{2}}{G ( p _{1} , p _{2} )} I_{2} (x_{2}) .

n \to \infty lim \frac{1}{n} lo g P (\overset{ˉ}{X} (n p_{1} / G (p_{1}, p_{2})) < \overset{ˉ}{X} (n p_{2} / G (p_{1}, p_{2})))

n \to \infty lim \frac{1}{n} lo g P (\overset{ˉ}{X} (n p_{1} / G (p_{1}, p_{2})) < \overset{ˉ}{X} (n p_{2} / G (p_{1}, p_{2})))

p_{1}, p_{2} min (p_{1} + p_{2}) / G (p_{1}, p_{2}) \mbox s . t . p_{1} + p_{2} = 1, p_{1} > 0, p_{2} > 0.

p_{1}, p_{2} min (p_{1} + p_{2}) / G (p_{1}, p_{2}) \mbox s . t . p_{1} + p_{2} = 1, p_{1} > 0, p_{2} > 0.

\tilde{n}_{i}^{e}(\alpha)=\frac{\log(1/\alpha)}{2G(1/2,1/2)}\mbox{ for $i=1,2$.}

\tilde{n}_{i}^{e}(\alpha)=\frac{\log(1/\alpha)}{2G(1/2,1/2)}\mbox{ for $i=1,2$.}

\tilde{n}_{i}^{in}(\alpha)=\frac{\log(1/\alpha)}{I_{i}(b)}\mbox{ for $i=1,2$.}

\tilde{n}_{i}^{in}(\alpha)=\frac{\log(1/\alpha)}{I_{i}(b)}\mbox{ for $i=1,2$.}

α \to 0 lim \frac{lo g P ( X ˉ _{1} ( n ~ _{1}^{in} ( α )) < X ˉ _{2}^{δ} ( n ~ _{2}^{in} ( α ))}{lo g ( α )} < 1.

α \to 0 lim \frac{lo g P ( X ˉ _{1} ( n ~ _{1}^{in} ( α )) < X ˉ _{2}^{δ} ( n ~ _{2}^{in} ( α ))}{lo g ( α )} < 1.

lo g (1/ α_{k}) δ_{k}^{(1 - 2 β) / β} = L,

lo g (1/ α_{k}) δ_{k}^{(1 - 2 β) / β} = L,

\hat{n}_{i}(k)=\frac{\log(1/\alpha_{k})}{\delta_{k}^{2}\hat{G}(p_{1},p_{2})}p_{i},~{}~{}~{}\mbox{ for $i=1,2$}.

\hat{n}_{i}(k)=\frac{\log(1/\alpha_{k})}{\delta_{k}^{2}\hat{G}(p_{1},p_{2})}p_{i},~{}~{}~{}\mbox{ for $i=1,2$}.

k \to \infty lim \frac{lo g P ( X ˉ _{1} ( n ^ _{1} ( k )) < X ˉ _{2}^{δ_{k}} ( n ^ _{2} ( k )) )}{lo g ( α _{k} )} = 1.

k \to \infty lim \frac{lo g P ( X ˉ _{1} ( n ^ _{1} ( k )) < X ˉ _{2}^{δ_{k}} ( n ^ _{2} ( k )) )}{lo g ( α _{k} )} = 1.

P (\overset{ˉ}{X}_{1} (\overset{n}{^}_{1} (k)) < \overset{ˉ}{X}_{2}^{δ_{k}} (\overset{n}{^}_{2} (k)))

P (\overset{ˉ}{X}_{1} (\overset{n}{^}_{1} (k)) < \overset{ˉ}{X}_{2}^{δ_{k}} (\overset{n}{^}_{2} (k)))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGame Theory and Voting Systems · Auction Theory and Applications · Consumer Market Behavior and Pricing

Full text

Three Asymptotic Regimes for Ranking and Selection with General Sample Distributions

Jing Dong, Yi Zhu

Abstract

In this paper, we study three asymptotic regimes that can be applied to ranking and selection (R&S) problems with general sample distributions. These asymptotic regimes are constructed by sending particular problem parameters (probability of incorrect selection, smallest difference in system performance that we deem worth detecting) to zero. We establish asymptotic validity and efficiency of the corresponding R&S procedures in each regime. We also analyze the connection among different regimes and compare the pre-limit performances of corresponding algorithms.

1 Introduction

Ranking and selection (R&S) refers to the statistical procedure to select the simulated systems with the best performance (largest or smallest mean) among a finite number of alternatives with high probability. Most of the existing procedures are constructed under the assumption that the samples follow a Gaussian distribution or some other rather restricted class of distributions (e.g. sub-Gaussian, bounded support) to gain control over the probability of correct selection. When these assumptions are violated, the desired performance can only be guaranteed in an asymptotic sense.

Asymptotic analysis is achieved by sending the sample size to infinity. However, simply sending the sample size to infinity will not convey any meaningful information. By law of large numbers, it simply implies that the probability of correct selection will converge to one. To define the asymptotic regimes in a proper and meaningful way, we consider two parameters that characterize the “difficulty” of the problem: 1) the difference between the best system and the second best system, which we denote as $\delta$ ; 2) the probability of incorrect selection (PIS), which we denote as $\alpha$ . Under the indifference zone formulation, $\delta$ denotes the smallest difference in system performance that we deem worth detecting. Thus, $\delta$ is also known as the indifference-zone parameter. As either one of these parameters gets smaller, it requires more samples to achieve the desired performance.

In this paper, we study asymptotic regimes that can be applied for R&S problems with general sample distributions. The first limiting regime is called the central limit theorem regime, which is derived by sending $\delta$ to zero. The second limiting regime is called the large deviation regime, which is derived by sending $\alpha$ to zero. The third limiting regime is called the moderate deviation regime, which is derived by sending $(\alpha,\delta)$ to zero at an appropriate rate. We present the theoretical foundation of each limiting regime and develop sequential stopping procedures for problems with unknown variance.

The central limit theorem regime has been applied in the R&S literature, mostly under the the indifference zone formulation. [10] defined that an indifference zone procedure is asymptotically consistent if $\lim_{\delta\rightarrow 0}PIS\leq\alpha$ . [11] propose a sequential stopping procedure for R&S problems with unknown variance and show that their algorithm is asymptotically consistent. [9] develop a fully sequential selection procedure for steady-state simulation that is shown to be asymptotically consistent. This limiting regime has also been used to establish the asymptotic validity of sequential stopping procedures to construct fixed-width confidence intervals [7]. The limit is achieved by sending the width of the confidence interval to zero. We will provide more details about this limiting regime in §2.1.

The large deviation regime has been applied in the ordinal optimization literature. The appealing fact is that while the width of the confidence interval decreases at rate $1/\sqrt{n}$ (due to the central limit theorem), the PIS actually decays exponentially fast in $n$ (due to the large deviation theory) [3]. Results from this limiting regime, in particular, the large deviation rate function of the PIS, has been applied to find the optimal budget allocation rules, i.e. to minimize PIS under a fixed budget (see for example [5], [13] and [8]). The large deviation type of upper bound on the probability of incorrect selection has also been applied in the multi-arm bandit literature [2]. Compared to the R&S literature, the key performance measure for the multi-arm bandit literature is the regret, which measures the cumulative opportunity cost of not knowing the optimal system. Two of the well-known sampling strategies in this literature is the upper confidence bound strategy and the Thompson sampling. Both are shown to achieve an $O(log(n))$ regret bound, under the assumption that we have access to the large deviation rate function or an upper bound of the rate function in closed form. This assumption imposes constrains on the type of sample distributions we can work with. We will survey more details about this limiting regime in §2.2.

To the best of our knowledge, the moderate deviation regime studied in this paper has not been applied in the R&S or the ordinal optimization literature, though the moderate deviation theory is well-studied in the applied probability literature [4]. As we shall explain in subsequent development (§2.3), this asymptotic regime tends to strike a balance between the central limit theorem regime and the large deviation regime.

2 The Three Asymptotic Regimes

To demonstrate the basic ideas, we restrict our discussion to the comparison between two systems. To formalize the asymptotic analysis, we first define a suitable sequence of distributions. Let $X_{1}$ and $X_{2}^{0}$ be two random variables with the same mean $\mu_{1}$ and potentially different variance, $\sigma_{1}^{2}<\infty$ and $\sigma_{2}^{2}<\infty$ , respectively. We define $X_{2}^{\delta}$ , indexed by $\delta$ , as a sequence of random variables with cumulative distribution function $F_{2}^{\delta}(x)=F_{2}^{0}(x+\delta)$ , i.e. $X_{2}^{\delta}\,{\buildrel d\over{=}}\,X_{2}^{0}-\delta$ . In particular, $\mu_{2}^{\delta}=\mu_{1}-\delta$ . We denote $X_{1,k}$ , $k\geq 1$ , as i.i.d. copies of $X_{1}$ , and $X_{2,k}^{\delta}$ , $k\geq 1$ , as i.i.d. copies of $X_{2}^{\delta}$ . Let

[TABLE]

denote the sample means of $X_{1}$ and $X_{2}^{\delta}$ . We also denote the sample variances as

[TABLE]

Our goal is to select the system with the largest mean value when comparing $X_{1}$ and $X_{2}^{\delta}$ . In particular, if we draw $n_{1}$ samples from $X_{1}$ and $n_{2}$ samples from $X_{2}^{\delta}$ , and select the system with the largest sample mean, then

[TABLE]

Remark 1.

Other definitions of the sequence of random variables may also work. In general, we need to assume that the variances of $X_{2}^{\delta}$ ’s do not depend on $\delta$ .

Recall that $\alpha$ denotes the required level of the PIS, and $\delta$ denotes the difference in mean between the two systems ( $X_{1}$ and $X_{2}^{\delta}$ ). We consider the following three asymptotic regimes. i) Keep $\alpha$ fixed and send $\delta$ to zero; ii) Keep $\delta$ fixed and send $\alpha$ to zero; iii) Send both $\alpha$ and $\delta$ to zero at an appropriate rate. We shall elaborate on each of these three regimes next.

2.1 The Central Limit Theorem Regime

In this limiting regime, we keep $\alpha$ fixed and send $\delta$ to zero. We start with the known variance case. For fixed $q_{1},q_{2}>1$ satisfying $1/q_{1}+1/q_{2}=1$ , we set the required sample sizes as

[TABLE]

where $z_{\alpha}$ is the $\alpha$ -th upper tail quantile of a standard normal distribution. We draw $n_{i}(\delta)$ samples from system i, and pick the system with the largest mean. The following theorem establishes the asymptotic validity of this procedure.

Theorem 1.

Under the assumption that $\sigma_{i}^{2}<\infty$ for $i=1,2$ ,

[TABLE]

Proof of Theorem 1.

We notice that

[TABLE]

The convergence follows by Central Limit Theorem. ∎

We next introduce some possible choices of the parameter $q_{i}$ ’s. i) If we are to minimize $n_{1}(\delta)+n_{2}(\delta)$ for each value of $\delta$ , then we set $q_{1}=1+\sigma_{2}/\sigma_{1}$ and $q_{2}=1+\sigma_{1}/\sigma_{2}$ . In this case

[TABLE]

ii) If we want to draw equal amount of samples from both systems, then we set $q_{i}=(\sigma_{1}^{2}+\sigma_{2}^{2})/\sigma_{i}^{2}$ , for $i=1,2$ . In this case

[TABLE]

iii) If we want to run the simulation without taking into account the information of the other system, then we can set, for example, $q_{1}=q_{2}=2$ . In this case

[TABLE]

When the variances are not known. We can apply the following sequential stopping procedure to decide the appropriate number of samples needed. In this paper, we will focus on the case of equal sample sizes only. We define the stopping time

[TABLE]

where $\delta^{-1}$ is introduced to avoid early stopping. We keep sampling the two systems until the total sample variance over the sample size is smaller than $\delta^{2}/z_{\alpha}^{2}$ , and then we pick the system with the largest sample mean. The following theorem establishes the asymptotic validity of the sequential stopping procedure.

Theorem 2.

Under the assumption that $\sigma_{i}^{2}<\infty$ for $i=1,2$ ,

[TABLE]

Proof of Theorem 2.

For general $t\in\mathbb{R}^{+}$ , define $\bar{X}_{1}(t):=\bar{X}_{1}(\lfloor t\rfloor)$ , $\bar{X}_{2}^{\delta}(t):=\bar{X}_{2}^{\delta}(\lfloor t\rfloor)$ , $S_{1}^{2}(t):=S_{1}^{2}(\lfloor t\rfloor)$ and $S_{2}^{\delta,2}(t):=S_{2}^{\delta,2}(\lfloor t\rfloor)$ . We first notice that

[TABLE]

As $S_{1}^{2}((t+\delta)/\delta^{2})\Rightarrow\sigma_{1}^{2}I$ and $S_{2}^{0,2}((t+\delta)/\delta^{2})\Rightarrow\sigma_{2}^{2}I$ in $D[0,\infty)$ as $\delta\rightarrow 0$ , where $I(t)=1$ ,

[TABLE]

As $z_{\alpha}^{2}(\sigma_{1}^{2}+\sigma_{2}^{2})-t$ is continuous and monotonically decreasing in $t$ [14],

[TABLE]

We next notice that

[TABLE]

As $\frac{t}{\delta}(\bar{X}_{1}(t/\delta^{2})-\bar{X}_{2}^{0}(t/\delta^{2}))\Rightarrow\sqrt{\sigma_{2}^{2}+\sigma_{2}^{2}}B(t)\mbox{ in$ D[0,\infty) $as$ \delta\rightarrow 0 $}$ , using standard random time change and convergence together argument, we have

[TABLE]

in $D[0,\infty)$ as $\delta\rightarrow 0$ . Thus,

[TABLE]

∎

Remark 2.

Alternatively, we can also set

[TABLE]

Then we can show that $\lim_{\delta\rightarrow 0}P\left(\bar{X}_{1}(\kappa_{1}^{in}(\delta))<\bar{X}_{2}^{\delta}(\kappa_{2}^{in}(\delta))\right)=\alpha$ . The separation of the simulation of the two systems may become handy in parallelization.

2.2 The Large Deviation Regime

In this limiting regime, we keep $\delta$ fixed and send $\alpha$ to zero. We impose light tail assumptions on the sample distribution.

Assumption 1.

There exists $\theta>0$ such that $E[\exp(\theta X_{1})]<\infty$ and $E[\exp(\theta X_{2}^{0})]<\infty$ .

We next introduce a few notations. Let $\psi_{1}(\theta):=\log E[\exp(\theta X_{1})]$ and $\psi_{2}(\theta):=\log E[\exp(\theta X_{2}^{\delta})]$ , i.e. the log moment generating functions. We also write $I_{i}(a):=\sup_{\theta}\{\theta a-\psi_{i}(\theta)\},$ which is known as the Fenchel-Legendre transformation of $\psi_{i}$ , for $i=1,2$ . Let $\mathcal{D}_{i}=\{\theta\in\mathbb{R}:\psi_{i}(\theta)<\infty\}$ and $\mathcal{S}_{i}=\{\psi_{i}^{\prime}(\theta):\theta\in\mathcal{D}\}$ . It is well-known that $I_{i}$ is strictly convex and $C^{\infty}$ for $a\in\mathcal{S}_{i}$ . We also make the following assumption on the sample distribution

Assumption 2.

The interval $[\mu_{1},\mu_{1}+\delta]\subset\mathcal{S}_{1}^{o}\cap\mathcal{S}_{2}^{o}$ .

We assume that $I_{i}$ ’s are known. For fixed $p_{1},p_{2}>0$ with $p_{1}+p_{2}=1$ . We can interpret $p_{i}$ as the proportion of sampling budget allocated to system $i$ , $i=1,2$ . We denote $G(p_{1},p_{2})=\min_{b\in(\mu_{1},\mu_{1}+\delta)}\{p_{1}I_{1}(b)+p_{2}I_{2}(b)\}$ . Then set the sample size

[TABLE]

We draw $\tilde{n}_{i}(\alpha)$ from system $i$ , $i=1,2$ , and pick the system with largest sample mean. The following theorem establishes the asymptotic validity, in a logarithmic sense, of the procedure.

Theorem 3.

Under Assumption 1 & 2,

[TABLE]

The proof of the theorem follows from the same line of analysis as in [5]. We shall only provide an outline here. Let $\tilde{n}(\alpha)=\log(1/\alpha)$ . Then $\tilde{n}_{i}(\alpha)=\tilde{n}(\alpha)p_{i}/G(p_{1},p_{2})$ for $i=1,2$ . We also denote $Z(n)=\left(\bar{X}(np_{1}/G(p_{1},p_{2})),\bar{X}(np_{2}/G(p_{1},p_{2}))\right)$ . Then the rate function of $\{Z(n):n\geq 0\}$ , is (Lemma 1 in [5])

[TABLE]

Then we have,

[TABLE]

Thus, $\lim_{\alpha\rightarrow\infty}\frac{1}{\tilde{n}(\alpha)}\log P\left(\bar{X}(\tilde{n}_{1}(\alpha))<\bar{X}(\tilde{n}_{2}(\alpha))\right)=-1.$

We next provide some special choices of $p_{i}$ ’s. i) If we want to minimize the sampling cost, then we pick $(p_{1},p_{2})$ that solves

[TABLE]

ii) If we want to draw equal amount of samples from the two systems, then we set $p_{1}=p_{2}=1/2$ . In this case

[TABLE]

iii) It is also possible to draw samples from each system without taking into account the information of the other system. For example, we can pick any $b\in(\mu_{1},\mu_{1}+\delta)$ and set

[TABLE]

However, in this case we “overshoot” the PIS. In particular, following the proof of Theorem 3, it is easy to check that

[TABLE]

In applications, the assumption that $I_{i}(\cdot)$ ’s are known is rather restrictive. When $I_{i}(\cdot)$ ’s are not known, estimating this function would in general be a more difficult task than estimating the means. Recently, [6] conduct an extensive analysis of this issue.

2.3 The Moderate Deviation Regime

In this regime, we send both $\alpha$ and $\delta$ to zero at an appropriate rate. In particular, we consider a sequence of $(\alpha_{k},\delta_{k})$ ’s, indexed by $k\in\mathbb{N}$ , satisfying that $\alpha_{k}\rightarrow 0$ and $\delta_{k}\rightarrow 0$ as $k\rightarrow\infty$ , and

[TABLE]

for some $\beta\in(1/3,1/2)$ and $L>0$ , independent of $k$ .

We start by assuming that the variances are known. In this case, for fixed $p_{1},p_{2}>0$ with $p_{1}+p_{2}=1$ , we set

[TABLE]

where $\hat{G}(p_{1},p_{2})=p_{1}p_{2}/(2(\sigma_{1}^{2}p_{2}+\sigma_{2}^{2}p_{1}))$ . For $\delta_{k}=\delta$ , $\alpha_{k}=\alpha$ , we draw $\hat{n}_{i}(k)$ samples from system $i$ , and then choose the system with the largest sample mean. The following theorem establishes the asymptotic validity, in a logarithmic sense, of the procedure.

Theorem 4.

Under Assumption 1,

[TABLE]

Proof of Theorem 4.

Let $\hat{n}(k)=\log(1/\alpha_{k})\delta_{k}^{-2}$ , $q_{i}=p_{i}/\hat{G}(p_{1},p_{2})$ .

[TABLE]

Let $Z_{n}=(n^{\beta}(\bar{X}_{1}(nq_{1})-\mu_{1}),n^{\beta}(\bar{X}_{2}^{0}(nq_{2})-\mu_{1}))$ , then

[TABLE]

By Gartner-Ellis theorem, $Z_{n}$ satisfies a LDP with rate $n^{1-2\beta}$ and rate function

[TABLE]

In particular,

[TABLE]

As $L^{2\beta}\hat{n}(k)^{1-2\beta}=\log(1/\alpha_{k})$ , we have

[TABLE]

∎

We next provide some special choices of $p_{i}$ ’s. i) If we want to minimize the total sampling cost $\hat{n}_{1}(k)+\hat{n}_{2}(k)$ for each $k$ , then we pick $p_{i}=\sigma_{i}/(\sigma_{1}+\sigma_{2})$ for $i=1,2$ . In this case

[TABLE]

ii) When $p_{1}=p_{2}=1/2$ , we draw equal amount of samples from the two systems. In this case

[TABLE]

iii) It is also possible to draw samples from each system without taking into account the information of the other system. In particular, when $p_{i}=\sigma_{i}^{2}/(\sigma_{1}^{2}+\sigma_{2}^{2})$ ,

[TABLE]

When the variances are not known a priori, we introduce a sequential stopping procedure. In this paper, we shall focus on the case of equal sample sizes only. We also impose the following assumption on sample distribution (mainly for technical reasons).

Assumption 3.

There exist $\theta>0$ such that $E[\exp(\theta(X_{1}-\mu_{1})^{2})]<\infty$ and $E[\exp(\theta(X_{2}^{0}-\mu_{1})^{2})]<\infty$ .

We define the stopping time

[TABLE]

We keep sampling the two systems until the total sample variance over the sample size is smaller than $\delta_{k}^{2}/(2\log(1/\alpha_{k}))$ , and then we pick the system with the largest sample mean. The following theorem establishes the asymptotic validity of the sequential stopping procedure.

Theorem 5.

Under Assumption 3,

[TABLE]

Remark 3.

Alternatively, we can also set

[TABLE]

Then following the same line of analysis as in the Proof of Theorem 5, we have

[TABLE]

The separation of the simulation of the two systems may become handy in parallelization.

2.3.1 Proof of Theorem 5

We first notice that as $\log(1/\alpha_{k})\delta_{k}^{(1-2\beta)/\beta}=L$ ,

[TABLE]

As the distribution of $S_{2}^{\delta_{k},2}(n)$ does not depend on $\delta_{k}$ . We shall drop the superscription $\delta_{k}$ when there is no confusion. Let $S_{i}^{2}(t):=S_{i}^{2}(\lfloor t\rfloor)$ for $t\geq 0$ . As $S_{i}^{2}(n)\Rightarrow\sigma_{i}^{2}$ as $n\rightarrow\infty$ , by Continuous Mapping Theorem

[TABLE]

We next establish an upper bound for $P\left(|\delta_{k}^{1/\beta}N_{k}-2L(\sigma_{1}^{2}+\sigma_{2}^{2})|>\epsilon\right)$ for any $\epsilon>0$ small enough.

[TABLE]

for $i=1,2$ . Thus,

[TABLE]

where $\tilde{\psi}_{i}(\theta)=\log E[\exp(\theta(X_{i}-\mu_{i})^{2})]$ . By Gartner-Ellis Theorem, $S_{1}^{2}(n)+S_{2}^{2}(n)$ satisfies a LDP with rate function $I(a)=\sup\left\{\theta a-\left(\tilde{\psi}_{1}(\theta)+\tilde{\psi}_{2}(\theta)\right)\right\}$ . Then we have

[TABLE]

and

[TABLE]

We denote $B_{\epsilon}(k):=\{t:|\delta_{k}^{1/\beta}t-2L(\sigma_{1}^{2}+\sigma_{2}^{2})|<\epsilon\}$ for any $\epsilon>0$ . Then $P(N_{k}\in B_{\epsilon}(k))\leq\exp(-\delta_{k}^{-1}I(\epsilon)+o(\delta_{k}^{-1})).$ Let $N_{k}^{*}=2L(\sigma_{1}^{2}+\sigma_{2}^{2})\delta_{k}^{-1/\beta}$ and $a_{\epsilon}(n)=n/N_{k}^{*}$ . Then for any $n\in B_{\epsilon}(k)$ .

[TABLE]

We also notice that

[TABLE]

Thus,

[TABLE]

where the second inequality use the fact that $S_{i}^{2}(n)$ is independent of $\bar{X}_{i}(n)$ , the third inequality follows from the proof of Theorem 4 and the fact that $0<1/\beta-2<1$ . Similarly,

[TABLE]

As $L\delta_{k}^{-(1/\beta-2)}=\log(\alpha_{k})$ and $\epsilon$ can be arbitrarily small, we have

[TABLE]

3 Comparison of the Three Asymptotic Regimes

The three asymptotic regimes are closely related to each other. Figure 1 provide an overview of their relationships. We’ve established the three solid arrows in the figure in §2. We next show the two dotted arrows for some special cases (Lemma 1 and Lemma 2).

Lemma 1.

The $(1-\alpha)$ -th quantile of standard Normal distribution, $z_{\alpha}$ , satisfies,

[TABLE]

Lemma 1 implies for $\alpha$ small enough, and $\alpha_{k}=\alpha$ , we have $n_{i}^{*}(\delta_{k})\approx\hat{n}_{i}^{*}(k)$ , $n_{i}^{e}(\alpha)\approx\hat{n}_{i}^{e}(k)$ , and $n^{in}(\alpha_{k})\approx\hat{n}^{in}(k)$ .

For the large deviation regime, when we impose equal sample sizes from the two systems, then we have $P(\bar{X}_{1}(n)<\bar{X}_{2}^{\delta}(n))=P(\bar{X}_{2}^{0}(n)-\bar{X}_{1}(n)>\delta)$ . Let $\psi(\theta):=\log E[\exp(X_{2}^{0}-X_{1})]$ . We also define $\mathcal{D}:=\{\theta\in\mathbb{R}:\psi(\theta)<\infty\}$ and $\mathcal{S}:=\{\psi^{\prime}(\theta):\theta\in\mathcal{D}\}$ . If we write $G_{e}(\delta)=\sup_{\theta}\{\theta\delta-\psi(\theta)\}$ , then $\tilde{n}_{1}^{e}(\alpha)=\tilde{n}_{2}^{e}(\alpha)=\log(1/\alpha)/G_{e}(\delta)$ .

Lemma 2.

For $\delta\in\mathcal{S}$ ,

[TABLE]

Lemma 2 implies that for $\delta$ small, $\delta_{k}=\delta$ , $\tilde{n}_{i}^{e}(\alpha_{k})\approx\hat{n}_{i}^{e}(k)$ .

3.1 Pre-Limit Performance

We next provide some comments about the pre-limit performance, i.e. for fixed $\alpha$ and $\delta$ . For simplicity of exposition, we shall restrict our discussion to the case of equal sample sizes.

In the Central Limit Theorem regime, when $X_{1}$ and $X_{2}^{\delta}$ are Gaussian random variables, we have $P\left(\bar{X}_{1}(n_{1}^{e}(\delta))<\bar{X}_{2}^{\delta}(n_{2}^{e}(\delta))\right)=\alpha$ . In general, the performance of “ $n_{i}^{e}(\delta)$ ” depends on the “rate” of convergence of the central limit theorem. Assume that $E[(X_{2}^{0}-X_{1})^{4}]<\infty$ and $X_{2}^{0}-X_{1}$ is non-lattice. Let

[TABLE]

denote the skewness of $X_{2}^{0}-X_{1}$ . Skewness is a measure of asymmetry of a random variable about its mean. We also write

[TABLE]

as the Kurtosis of $X_{2}^{0}-X_{1}$ . Kurtosis measures the heaviness of the tail of a random variable. Let $\bar{\Phi}(\cdot)$ and $\phi(\cdot)$ denote the tail cumulative distribution function and the probability density function of a standard normal distribution. Using the Edgeworth expansion [12], we can show that

[TABLE]

We make the following observations. a) $\phi(z_{\alpha})/z_{\alpha}>\alpha$ with $\lim_{\alpha\rightarrow 0}\frac{\phi(z_{\alpha})/z_{\alpha}}{\alpha}=1$ . b) For distribution with large skewness or Kurtosis, the pre-limit PIS may be quite different from $\alpha$ . A common practice to reduce the approximation error (improve the rate of convergence) is to use the Cornish-Fisher expansion [12] to refine the scaling parameter $z_{\alpha}$ , but this would require us to know higher moments of the sample distributions.

For the large deviation regime, we first notice that, using Chernoff’s bound,

[TABLE]

To quantify how much smaller $P\left(\bar{X}_{1}(\tilde{n}_{1}(\alpha))<\bar{X}_{2}^{\delta}(\tilde{n}_{1}(\alpha))\right)$ is, compared to $\alpha$ , we refer to a refinement of the large deviation asymptotic approximation due to [1]. Assume that $\psi(\theta)$ is steep on the right and $X_{1}-X_{2}^{0}$ is non-lattice. We denote $\theta(\delta):=\arg\min\{\theta\delta-\psi(\theta)\}$ . Then we can show that

[TABLE]

As $\lim_{\delta\rightarrow 0}\sqrt{G(\delta)}/(\sqrt{\psi^{\prime\prime}(\delta)}\theta(\delta))=1/\sqrt{2}$ , when $\delta$ is small enough, $P\left(\bar{X}_{1}(\tilde{n}_{1}(\alpha))<\bar{X}_{2}^{\delta}(\tilde{n}_{1}(\alpha)\right)$ decays at rate $\alpha/\sqrt{\log(1/\alpha)}$ approximately, which is slightly faster than $\alpha$ . Therefore, sampling rules derived from the large deviation regime provide a guarantee on PIS but tend to over-sample in practical examples.

For the moderate deviation regime, we first notice that $z_{\alpha}^{2}<2\log(1/\alpha)$ for fixed value of $\alpha$ . Thus, we tend to sample more than “needed” when the central limit theorem regime works well. However, this also provides us with a safety buffer when the central limit theorem regime doesn’t work well.

3.2 Numerical Comparison

The following numerical experiments illustrate the pre-limit performance of the sampling rules derived from the three asymptotic regimes (Table 1 & 2). The probability of incorrect selection are calculated based $10^{6}$ independent experiments. In Table 1, we assume both systems have Exponential sample distributions. We observe that in this case, the CLT regime sampling rule achieves the desired probability of incorrect selection, but the other two regimes overshoot the probability of incorrect selection, i.e. $PIS\ll 0.05$ . Table 2 illustrate an extreme example, where system 1 has constant output while system 2 has Bernoulli sample distribution with very small probability of success (highly skewed). There the CLT regime doesn’t achieve the desired probability of incorrect selection while the LD regime over-samples. We would also like to point out that among the three regimes, the LD regime is the only one that is guaranteed to have $PIS<\alpha$ regardless of the sample distributions.

Appendix A Proofs

Proof of Lemma 1.

We first notice that $\left(\frac{1}{x}-\frac{1}{x^{3}}\right)\phi(x)\leq\bar{\Phi}(x)\leq\frac{1}{x}\phi(x).$ As $z_{\alpha}\rightarrow\infty$ as $\alpha\rightarrow 0$ , then

[TABLE]

Similarly,

[TABLE]

Thus,

[TABLE]

∎

Proof of Lemma 2.

Applying Taylor expansion, we have for $\delta\in\mathcal{S}$

[TABLE]

Let $\theta(\delta):=\arg\min\{\theta\delta-\psi(\theta)\}$ . Then $G_{e}(\delta)=\theta(\delta)\delta-\psi(\theta(\delta))$ and

[TABLE]

We also notice that $\psi^{\prime}(\theta(\delta))=\delta$ Thus, $\theta^{\prime}(\delta)=\frac{1}{\psi^{\prime\prime}(\theta(\delta))}\mbox{ and }\theta^{\prime\prime}(\delta)=-\frac{\psi^{\prime\prime\prime}(\theta(\delta))}{\psi^{\prime\prime}(\theta(\delta))^{3}}$ . The results then follows by noting that $\psi(0)=0$ , and $\psi^{(k)}(0)=E[(X_{2}^{0}-X_{1})^{k}]$ for $k=1,2,\dots$ . ∎

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R.R. Bahadur and R.R. Rao. On deviations of the sample mean. Annals of Mathematical Statistics , 31(4):1015–1027, 1960.
2[2] S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1–122, 2012.
3[3] L. Dai. Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems. Journal of Optimization Theory and Applications , 91(2):363–388, 1996.
4[4] A. Dembo and O. Zeitouni. Large deviation techniques and applications . Springer, New York, 1998.
5[5] P. Glynn and S. Juneja. A large deviation perspective on ordinal optimization. In R.G. Ingalls, M.D. Rossetti, J.S. Smoth, and B.A. Peters, editors, Proceedings of the 2004 Winter Simulation Conference , pages 577–585, Piscataway, New Jersey, 2004. IEEE, inc.
6[6] P. Glynn and S. Juneja. Ordinal optimization - empirical large deviations rate estimators, and stochastic multi-armed bandits. Working paper, available at http://arxiv.org/pdf/1507.04564 v 1.pdf, 2015.
7[7] P. Glynn and W. Whitt. The asymptotic validity of sequential stopping rules for stochastic simulations. The Annals of Applied Probability , 2(1):180–198, 1992.
8[8] S.R. Hunter and R. Pasupathy. Optimal sampling laws for stochastically constrained simualtion optimization on finite sets. INFORMS Journal on Computing , 25(3):527–542, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Three Asymptotic Regimes for Ranking and Selection with General Sample Distributions

Abstract

1 Introduction

2 The Three Asymptotic Regimes

Remark 1**.**

2.1 The Central Limit Theorem Regime

Theorem 1**.**

Proof of Theorem 1.

Theorem 2**.**

Proof of Theorem 2.

Remark 2**.**

2.2 The Large Deviation Regime

Assumption 1**.**

Assumption 2**.**

Theorem 3**.**

2.3 The Moderate Deviation Regime

Theorem 4**.**

Proof of Theorem 4.

Assumption 3**.**

Theorem 5**.**

Remark 3**.**

2.3.1 Proof of Theorem 5

3 Comparison of the Three Asymptotic Regimes

Lemma 1**.**

Lemma 2**.**

3.1 Pre-Limit Performance

3.2 Numerical Comparison

Appendix A Proofs

Proof of Lemma 1.

Proof of Lemma 2.

Remark 1.

Theorem 1.

Theorem 2.

Remark 2.

Assumption 1.

Assumption 2.

Theorem 3.

Theorem 4.

Assumption 3.

Theorem 5.

Remark 3.

Lemma 1.

Lemma 2.