On the lengths of $t$-based confidence intervals
Yu Zhang, Xiangzhong Fang

TL;DR
This paper compares two methods for constructing $t$-based confidence intervals from iid samples of a normal distribution, showing the second method yields longer intervals but remains valid under correlation, unlike the first.
Contribution
The paper provides a theoretical proof that dividing samples into groups for $t$-intervals results in longer expected lengths and clarifies the validity of each method under correlation.
Findings
Second method produces longer expected confidence intervals.
First method becomes invalid if group elements are correlated.
Second method remains valid with correlated data.
Abstract
Given samples from with and unknown, we have two ways to construct -based confidence intervals for . The traditional method is to treat these samples as groups and calculate the intervals. The second, and less frequently used, method is to divide them into groups with each group containing elements. For this method, we calculate the mean of each group, and these mean values can be treated as samples from . We can use these values to construct -based confidence intervals. Intuition tells us that, at the same confidence level , the first method should be better than the second one. Yet if we define "better" in terms of the expected length of the confidence interval, then the second method is better because the expected length of the confidence interval obtained…
| m | 420 | 210 | 140 | 105 | 84 | 70 |
|---|---|---|---|---|---|---|
| 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | |
| m | 60 | 42 | 35 | 30 | 28 | 21 |
| 0.19 | 0.19 | 0.19 | 0.19 | 0.19 | 0.20 | |
| m | 20 | 15 | 14 | 12 | 10 | 7 |
| 0.20 | 0.20 | 0.20 | 0.20 | 0.21 | 0.22 | |
| m | 6 | 5 | 4 | 3 | 2 | |
| 0.23 | 0.24 | 0.28 | 0.34 | 0.84 |
| m | 420 | 210 | 140 | 105 | 84 | 70 |
|---|---|---|---|---|---|---|
| 19.15 | 19.23 | 19.25 | 19.34 | 19.45 | 19.45 | |
| m | 60 | 42 | 35 | 30 | 28 | 21 |
| 19.55 | 19.57 | 19.92 | 20.17 | 20.20 | 20.33 | |
| m | 20 | 15 | 14 | 12 | 10 | 7 |
| 20.59 | 20.78 | 21.03 | 21.10 | 21.73 | 23.34 | |
| m | 6 | 5 | 4 | 3 | 2 | |
| 24.36 | 26.26 | 29.31 | 38.80 | 100.29 |
| 1 | 0.9 | 0.8 | 0.7 | 0.6 | ||||||
| k | AP | SP | AP | SP | AP | SP | AP | SP | AP | SP |
| 10 | 0.46 | 0.48 | 0.48 | 0.45 | 0.51 | 0.48 | 0.53 | 0.53 | 0.56 | 0.56 |
| 100 | 0.16 | 0.14 | 0.16 | 0.19 | 0.17 | 0.17 | 0.18 | 0.16 | 0.20 | 0.25 |
| 500 | 0.07 | 0.09 | 0.07 | 0.06 | 0.08 | 0.09 | 0.08 | 0.08 | 0.09 | 0.08 |
| 1000 | 0.05 | 0.06 | 0.05 | 0.07 | 0.06 | 0.05 | 0.06 | 0.05 | 0.06 | 0.05 |
| 0.5 | 0.4 | 0.3 | 0.2 | 0.1 | ||||||
| k | AP | SP | AP | SP | AP | SP | AP | SP | AP | SP |
| 10 | 0.60 | 0.61 | 0.64 | 0.64 | 0.69 | 0.72 | 0.76 | 0.76 | 0.84 | 0.84 |
| 100 | 0.21 | 0.21 | 0.24 | 0.25 | 0.28 | 0.26 | 0.33 | 0.28 | 0.45 | 0.48 |
| 500 | 0.10 | 0.08 | 0.11 | 0.10 | 0.13 | 0.12 | 0.15 | 0.17 | 0.22 | 0.23 |
| 1000 | 0.07 | 0.09 | 0.08 | 0.10 | 0.09 | 0.11 | 0.11 | 0.10 | 0.15 | 0.17 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods in Clinical Trials · Advanced Statistical Process Monitoring
On the lengths of -based confidence intervals
Yu Zhang1, Xiangzhong Fang1111Corresponding author. E-mail address: [email protected]
- School of Mathematical Sciences, Peking University,
Beijing, 100871, China
Abstract
Given samples from with and unknown, we have two ways to construct -based confidence intervals for . The traditional method is to treat these samples as groups and calculate the intervals. The second, and less frequently used, method is to divide them into groups with each group containing elements. For this method, we calculate the mean of each group, and these mean values can be treated as samples from . We can use these values to construct -based confidence intervals. Intuition tells us that, at the same confidence level , the first method should be better than the second one. Yet if we define “better” in terms of the expected length of the confidence interval, then the second method is better because the expected length of the confidence interval obtained from the first method is shorter than the one obtained from the second method. Our work proves this intuition theoretically. We also specify that when the elements in each group are correlated, the first method becomes an invalid method, while the second method can give us correct results. We illustrate this with analytical expressions.
Key words: -based confidence intervals, the expected length, coverage probability
1 Introduction
In this paper, we consider -based confidence intervals for mean parameter under normal cases. Suppose we have a group of variables , which are random variables, where and are both unknown, and the confidence level is . A -based confidence interval for can be constructed as follows. We choose the proper pivotal quantity and construct the confidence interval
[TABLE]
where
[TABLE]
Interval (1.1) is the traditional way to construct a confidence interval for . Now we introduce an alternative.
Suppose we can partition into equal-sized groups (assuming ), and denote the mean of each group by , which are . Using these new variables, the -based confidence interval at level for takes the form
[TABLE]
where
[TABLE]
(Note that when , this reduces to the conventional -based confidence interval, that is, (1.2) = (1.1).)
We can see that (1.1) and (1.2) share the same center point, , and the only difference is the distance from the center point. Now that there are two ways to obtain the confidence intervals for , a question may arise: Are these methods equivalent? Thus far, the literature has not compared the two. Therefore, in this paper we propose a method for comparing them.
To begin this analysis we first need to figure out how to evaluate the confidence intervals. The literature provides several ways to evaluate confidence intervals. In [1], the author proposed that one natural measure of confidence intervals is the expected length. So the expected length of the interval can be a rule to decide the better result. In [2], they proposed alternative intervals for a binomial proportion, and they examined the length of each interval. In [3], they proposed another rule coverage probability. However, they didn’t deny the validity of the expected length. They thought if we used the coverage as our criterion, it would be possible to construct the ”shortest” confidence intervals and these intervals could have much improved coverage accuracy as well as a shorter length. Based on the above we use the expected length of the intervals and the coverage probability as our criteria. We prove theoretically that (1.1) is better than (1.2) when the samples are , while if we face non- samples, we better use (1.2). And we also apply our results to data on HIV patients.
The rest of the paper is organized as follows. Section 2 introduces our main theorems and related lemmas. In this section, we first present the results when all samples are . Then we generalize the results to a non- case, which is reasonable in real practice. Section 3 concludes the detailed proofs of the results in Section 2. Section 4 presents some extensions related to our results. Section 5 contains several simulations, including a real data analysis. Section 6 is our discussions.
2 Main Theorems
Before proceeding to our main results, we need to define two symbols.
[TABLE]
[TABLE]
where is the length of (1.1) and is the length of (1.2).
The main task of our paper is to find out the relationship between and . As a first step in this process, we introduce the following trivial proposition related to and .
Proposition 1**.**
[TABLE]
Proposition 1 tells us the relationship between and . However, we can’t conclude that from it directly, as we can with positive real numbers.
The first main result of this paper is Theorem 1, which demonstrates the relationship between and when we are dealing with samples.
Theorem 1**.**
With fixed ,
[TABLE]
The expected length of , i.e., is
[TABLE]
So, (2.1) is equivalent to the follows. The expression
[TABLE]
as a function of is decreasing in .
In order to prove Theorem 1, we need several lemmas to help us.
Lemma 2.1**.**
Let be two integers. the following expression holds for all ,
[TABLE]
Lemma 2.2**.**
If
[TABLE]
then for all . Where is the difference between and for .
If we are able to prove Theorem 1, does it mean that (1.2) is useless? The answer is no. We illustrate this below, focusing specifically on some theorems related to non- cases.
In real practice, s can’t always be variables, sometimes there exists some kind of correlations among the part of the variables or the variables can be divided into equal-sized groups and there are correlations within each group, while groups are independent. When dealing with the latter case, we can use (1.2) to build the confidence intervals, while (1.1) is invalid under this setting.
Assumption 1**.**
Suppose we have samples. They can be divided into groups and each group contains elements. We use to denote the samples. And we have , where
[TABLE]
and is a symmetric positive definite matrix, where all .
We still use to denote the mean of these samples, that is, . Then we have the following generalization.
Theorem 2**.**
Under Assumption 1, the form of the -case interval at level is also:
[TABLE]
which is the same as (1.2). With the similar notation in Section 2, we use to denote the length of the interval, that is, .
Additionally, we obtain the expected length of the confidence interval under this case:
[TABLE]
where is the sum of the elements in , .
We can extend (2.1) in Theorem 1 to
[TABLE]
Now we generalize our results a little bit further.
Assumption 2**.**
Let , , where and have the same form in Assumption 1.
We define two different sample variances here. One is
[TABLE]
where . The other is
[TABLE]
where .
The next theorem tells us under Assumption 2, if we calculate -case interval, then we can’t obtain the desired coverage probability.
Theorem 3**.**
Under Assumption 2,
[TABLE]
We use Lemma 2.3 to help us finish the proof of Theorem 3.
Lemma 2.3**.**
[TABLE]
Theorem 3 tells us that the probability is less than . The next theorem shows a more accurate result.
Theorem 4**.**
Under Assumption 2,
[TABLE]
where and are the same as defined in (1.1) and (1.2).
3 Proofs
In this section, we provide detailed proofs of each theorem introduced in Section 2. We also prove related lemmas for each theorem.
First, we provide the proof of Proposition 1.
Proof of Proposition 1.
Note that is the sample variance and it is an unbiased estimator of the population variance. So we have and the same result with , .
[TABLE]
[TABLE]
We only need to compare and . Since , we have . Thus, we have the desired result. ∎
Now we proceed to the proof of Theorem 1 and present the proofs of Lemma 2.1 and Lemma 2.2.
Theorem 1 amounts to showing that
[TABLE]
for every . More generally, we aim to show that
[TABLE]
for all . This inequality can be rewritten as (2.2). To establish (2.2) is the subject of Lemma 2.1.
Proof of Lemma 2.1.
Suppose on the contrary that
[TABLE]
for some . Denote by . Note that must be larger than 1 since . Write for the difference between and for . In particular, this function satisfies
[TABLE]
In general, denoting the density of by , takes the following form
[TABLE]
Let be the integrand . If one can show that
[TABLE]
for all given
[TABLE]
that follows from the assumption (3.1), we get a contradiction to (3.2). Consequently, (3.1) cannot be satisfied.
Below, Lemma 2.2 affirms (3.3) as the last step to prove the present lemma, thus, concluding the proof of Theorem 1.
∎
Proof of Lemma 2.2.
Note that results from integrating . The sign of depends on whether the ratio
[TABLE]
exceeds 1 or not. Above, we use the fact that the density of the -based distribution with degrees of freedom reads
[TABLE]
for . Denote by this ratio . It is clear that , implying that . In addition, as , the ratio as well, and this reveals that for sufficiently large .
To get a closer look, note that
[TABLE]
The fact that and ensures that . Hence, we get
[TABLE]
Thus, , starting from , stays below 1 for and then stays above 1 for , where is some number determined by and . Put differently, the above discussion demonstrates that
[TABLE]
Having established (3.4), it is a stone’s throw away to prove the lemma. If , then (3.4) readily gives
[TABLE]
In the case where , (3.4) together with the fact that gives
[TABLE]
as desired.
∎
Now we begin to prove the generalized results.
Proof of Theorem 2.
First, we need to find out the distribution of .
Since , for , we have
[TABLE]
Note that is the sum of the elements in , namely, .
Secondly, we can proceed to our result. Let . . We also use to denote the sample variance.
It’s easy to get that
[TABLE]
where is actually a variant of a random variable.
So the expected length of the -case interval is
[TABLE]
∎
Before proceeding to the proofs of Theorem 3 and Theorem 4, we provide the proof of Lemma 2.3 first.
Proof of Lemma 2.3.
[TABLE]
Thus, . With the help of the equation
[TABLE]
we can have the desired lemma. ∎
Proof of Theorem 3.
From the Assumption 2, we have
[TABLE]
and
[TABLE]
Combine (3.7) and (3.8) together, we have
[TABLE]
Before having our final result, we still need one expression.
[TABLE]
With (2.3) and (3.10), now we have, with
[TABLE]
where is the percentile of . ∎
Proof of Theorem 4.
From (3.5) and (3.6), we can conclude that
[TABLE]
With the help of (3.12), (3.9) and Slutsky’s Theorem, we have
[TABLE]
Then the LHS of (2.5)
[TABLE]
where is unknown under our assumption. However, from (3.9), we know that
[TABLE]
So we can obtain the estimate of ,
[TABLE]
The LHS of (2.5) now becomes
[TABLE]
∎
4 Extensions
From the theorems in Section 2, we can learn that with a fixed level , (1.1) provides shorter interval than (1.2) when the samples are . We have proposed a special case (Assumption 1) in which the coverage probability of (1.1) is less than . Luckily, (1.2) can provide us the right confidence interval. In this section, we discuss a little bit more about (1.2). We propose some situations, in which (1.2) can be properly utilized, if we are restricted to the time cost or the equipment.
Situation 1
Suppose that our data are collected from locations. Gathering the data from all locations is time-consuming or impossible. In this case, we can only use (1.2) to calculate the confidence interval. We first calculate the mean of the data in each location and then transfer these mean values to a center location. Finally, we can use (1.2) to get the confidence interval with these mean values.
Situation 2
In this situation, the data are stored in one machine. However the scale of the data is extremely large. Due to the restriction of the hardware, calculating the confidence interval with (1.1) is impossible. We may split the data into equal-sized groups, calculate the mean and perform (1.2) simultaneously.
5 Simulations
In this section, we conduct several simulations as auxiliary validations for our theorems.
5.1 Simulation 1
Suppose the sample size is 420 and all the samples are from . The confidence level is 0.95.
420 has 23 factors except 1. We let these 23 values be the number of groups, that is, . We compute the length of the confidence intervals with (1.2) and we obtain 23 values.
We repeat the above procedure 100 times and obtain 100 values for each . Then we take the average of each set of 100 values and get Table 1. Note that when , it’s the -case interval.
Figure 2 shows the results intuitively. The horizontal axis is the values of , we have rescaled the axis to make the figure easy to understand. The vertical axis is the expected lengths of the confidence intervals. From the figure we can see the decreasing pattern in the lengths with increasing .
5.2 Simulation 2
We use the same settings as those in Simulation 1, except that samples are from . Figure 2 shows the results intuitively and more detailed results are presented in Table 2.
Please note that Figure 2 and Figure 2 look similar, while the ranges of their vertical axes are different.
5.3 Simulation 3
In this simulation, we sample our data under Assumption 1 with a special case. We consider an equi-correlated covariance matrix, that is, all off-diagonal elements in are equal to a constant .
We set and obtain Table 3 where AP stands for the values obtained from (2.5) and SP stands for the values obtained from this simulation.
In Figure 3, the red lines are SP values and the blue ones are AP values. We can see that the red lines exhibit a smooth decreasing tendency as the increases. The blue lines also have a decreasing tendency and they are close to the red lines, respectively. These plots show that for fixed , the coverage probability decreases as the increases.
In Figure 4, the number at the top of each plot is the value of , the vertical axis is the coverage probability obtained from (2.5) and the horizontal axis is the values of . We also rescaled the horizontal axes to make them easily understood. From these plots, we know that for fixed , the coverage probability decreases as the increases.
The figures in this simulation all show a decreasing tendency, which is consistent with the information revealed from (2.5).
5.4 Simulation 4
In this simulation, we conduct a real data analysis. This dataset is from the Baltimore site of the Multi-center AIDS Cohort Study (BMACS), which included 400 homosexual men who were infected by the human immunodeficiency virus (HIV) between 1984 and 1991 [4].
It has 1817 rows and 6 variables. The meaning of each variable is as follows.
- •
ID. Subject ID
- •
Time. Subject’s study visit time
- •
Smoke. Cigarette baseline smoking status
- •
age. Age at study enrollment
- •
preCD4. Pre-infection CD4 percentage
- •
CD4. CD4 percentage at the time of visit
We extracted the data with Time equals 0.2, then we obtained a subset with 138 observations. We calculate the confidence interval of the mean of the CD4 with this new subset.
In this case, we set to be and calculate the related intervals with (1.2).
Figure 5 shows four different values of the lengths with above . From Theorem 1, we learn that generally speaking, a bigger indicates a shorter interval. In Figure 5, the length decreases gradually as the increases.
6 Discussion
In this paper we first propose two ways to obtain the confidence interval for the mean parameter , i.e., the -case interval and the -case interval, where generally speaking, . Then we compare these two types of confidence intervals. For samples, with fixed confidence level, the expected length of the -case interval is shorter than the one of the -case interval. In other words, we should always use (1.1) to calculate the interval under the case. Then we propose a special case, under which (1.1) is no longer valid. However, (1.2) is a feasible method. Although, we can’t guarantee that (1.2) is an optimal solution.
Interval (1.2) is based on equal-sized groups. However, in real practice, our data may be divided into unequal-sized groups. In this case, the task becomes much tougher. The theory we proposed in this paper may not be proper any more when applied under certain circumstances. Future literature may seek new methods to deal with such unequal-sized cases.
In section 4, we provide two different cases where we can utilize our work to get better results. In addition, [5] is also another application of our results.
7 Acknowledgements
I would like to express my appreciation to Prof. Weijie Su, who has instructed me to complete this work. Special thanks to Prof. Xiangzhong Fang, my supervisor in Peking University, who gives me the freedom to do any research I’m interested in. I also appreciate Lynn Selhat, who is an excellent editor and helped me refine this paper. Finally, I acknowledge the support from China Scholarship Council.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] John W Pratt. Length of confidence intervals. Journal of the American Statistical Association , 56(295):549–567, 1961.
- 2[2] Lawrence D Brown, T Tony Cai, and Anirban Das Gupta. Interval estimation for a binomial proportion. Statistical science , pages 101–117, 2001.
- 3[3] Peter Hall. Theoretical comparison of bootstrap confidence intervals. The Annals of Statistics , pages 927–953, 1988.
- 4[4] Richard A Kaslow, David G Ostrow, Roger Detels, John P Phair, B Frank Polk, CHARLES R RINALDO Jr, and Multicenter AIDS Cohort Study. The multicenter aids cohort study: rationale, organization, and selected characteristics of the participants. American journal of epidemiology , 126(2):310–318, 1987.
- 5[5] Weijie Su and Yuancheng Zhu. Statistical inference for online learning and stochastic approximation via hierarchical incremental gradient descent. ar Xiv preprint ar Xiv:1802.04876 , 2018.
