Bayesian Inference of a Finite Population Mean Under Length-Biased Sampling
Zhiqing Xu, Balgobin Nandram, Binod Manandhar

TL;DR
This paper introduces a Bayesian approach for estimating the average shrub width in a finite population with unknown size, using length-biased sampling data, and demonstrates its robustness and improved performance over non-length-biased models.
Contribution
The paper develops a Bayesian method that accounts for length-biased sampling and unknown population size, employing a generalized gamma distribution for robustness.
Findings
Model with length bias performs better than without.
The method effectively estimates total shrub area.
Robust Bayesian inference via a random sampler enhances computation.
Abstract
We present a robust Bayesian method to analyze forestry data when samples are selected with probability proportional to length from a finite population of unknown size. Specifically, we use Bayesian predictive inference to estimate the finite population mean of shrub widths in a limestone quarry dominated by re-growth of mountain mahogany. The data on shrub widths are collected using transect sampling and it is assumed that the probability that a shrub is selected is proportional to its width; this is length-biased sampling. In this type of sampling, the population size is also unknown and this creates an additional challenge. The quantity of interest is average finite population shrub width and the total shrub area of the quarry can be estimated. Our method is assisted by using the three-parameter generalized gamma distribution, thereby robustifying our procedure against a possible…
| Transect | =width |
|---|---|
| I | 1.53, 0.87, 0.79, 0.78, 1.85, 1.45 |
| 0.48, 0.52, 0.22, 0.38, 0.59, 0.20 | |
| 0.42, 1.02, 0.97, 0.56, 0.62, 0.42 | |
| II | 1.15, 0.87, 0.57, 0.97, 0.57, 1.97 |
| 0.58, 2.54, 1.85, 0.35, 1.24, 1.80 | |
| 0.78, 0.98, 1.30, 1.55, 1.69, 2.12 | |
| 1.27, 0.75, 1.01, 1.82 | |
| III | 0.71, 1.50, 1.82, 1.86, 1.61, 1.21 |
| Transect | =width |
|---|---|
| I | 0.67, 0.31, 0.83, 1.95, 1.36, 1.45 |
| 0.72, 1.15, 0.98, 1.29, 0.88, 0.25 | |
| 0.63, 1.12, 0.34, 0.21, 1.36, 0.95 | |
| 1.04, 0.48, 1.05, 0.88, 0.16, 1.08 | |
| 0.95, 0.25, 0.30, 1.40, 0.58, 0.73 | |
| 1.30, 0.57 | |
| II | 0.96, 2.08, 0.68, 1.39, 0.50, 0.72 |
| 0.19, 1.91, 0.88, 0.48, 0.12 |
| 1.17 | 0.71 | 1.41 | 1.24 | 1.7 | 0.93 |
| 1.80 | 0.71 | 1.41 | 1.25 | 1.54 | 1.46 |
| Name | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|---|
| 0.25 | 0.28 | 1.62 | 2.86 | 5.36 | 6.78 | |
| 0.25 | 0.25 | 0.40 | 0.55 | 0.84 | 1.25 | |
| 0.25 | 0.25 | 0.62 | 0.82 | 1.33 | 1.91 | |
| 0.25 | 0.25 | 0.53 | 0.92 | 1.33 | 4.83 | |
| 0.48 | 0.58 | 0.83 | 1.06 | 1.35 | 2.50 | |
| 0.066 | 0.088 | 3.44 | 13.92 | 26.01 | 58.86 |
| Name | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
|---|---|---|---|---|---|---|
| 0.25 | 0.77 | 1.37 | 1.34 | 1.93 | 2.33 | |
| 0.07 | 0.29 | 0.46 | 0.52 | 0.71 | 1.50 | |
| 0.17 | 0.49 | 0.75 | 0.83 | 1.12 | 2.19 | |
| 0.13 | 0.56 | 0.87 | 0.96 | 1.29 | 2.98 | |
| 0.64 | 1.05 | 1.36 | 1.43 | 1.68 | 3.54 | |
| 0.31 | 0.67 | 0.75 | 0.74 | 0.81 | 1.01 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Ecology and Vegetation Dynamics Studies · Forest ecology and management
**Bayesian Inference of a Finite Population Mean
Under Length-Biased Sampling**
Zhiqing Xu, Balgobin Nandram
Department of Mathematical Sciences, Worcester Polytechnic Institute
100 Institute Road, Worcester, MA 01609
( [email protected], [email protected])
Binod Manandhar
Department of Mathematics, University of Houston
November 17, 2018
We present a robust Bayesian method to analyze forestry data when samples are selected with probability proportional to length from a finite population of unknown size. Specifically, we use Bayesian predictive inference to estimate the finite population mean of shrub widths in a limestone quarry dominated by re-growth of mountain mahogany. The data on shrub widths are collected using transect sampling and it is assumed that the probability that a shrub is selected is proportional to its width; this is length-biased sampling. In this type of sampling, the population size is also unknown and this creates an additional challenge. The quantity of interest is average finite population shrub width and the total shrub area of the quarry can be estimated. Our method is assisted by using the three-parameter generalized gamma distribution, thereby robustifying our procedure against a possible model failure. Using conditional predictive ordinates, we show that the model, which accommodates length bias, performs better than the model that does not. In the Bayesian computation, we overcome a technical problem associated with Gibbs sampling by using a random sampler.
Keywords: Conditional predictive ordinate, Generalized gamma distribution, Gibbs sampling, Weighted distribution, Random sampling, Robustness, Transect sampling.
1. Introduction
Unequal probability sampling method was first suggested by Hansen and Hurwitz (1943) who demonstrated that the use of unequal selection probabilities frequently allowed more efficient estimators of the population total than did equal probability sampling. The sampling procedure Hansen and Hurwitz (1943) proposed was length-biased sampling. It occurs when the sample selection probabilities are correlated with the values of a study variable, e.g., size variable. This problem falls under the general umbrella of selection bias problems in survey sampling.
Line intercept sampling is a length-biased method used to study certain quantitative characteristics of objects in a region of interest. In general, objects may be of any shape and size and may possess an arbitrary spatial distribution. For example, these objects may be shrubs or patches of vegetation in a field, or the projection of logs on the forest floor. The idea of line intercept sampling is to use lines (transects) as sampling units and measuring features of the objects (e.g., widths of shrubs) that crossed by them. A length-biased sampling method producing samples from a weighted distribution. With the underlying distribution of population, one can estimate the attributes of the population by converting the weighted samples to random samples (surrogate samples).
For the estimation of a finite population quantity, the problem is more complex than for a superpopulation parameter because if there is a bias which tends to make the sampled values large, the nonsampled values would tend to be small. Such an adjustment is difficult to carry out. Generally, it has been assumed that the sample size is much smaller than the population size, and this eliminates the finite population estimation problem. Recently Nandram, Bhatta and Bhadra (2013) proposed a Bayesian non-ignorable selection model to accommodate a selection mechanism for binary data; see also Nandram (2007).
There are several approaches to address the selection bias problem. One approach incorporates the nonsampled selection probabilities in a model. This approach is computer-intensive because the nonsampled part of the population is much larger than the sample; e.g., see Nandram, Choi, Shen and Burgos (2006), Nandram and Choi (2010) and Choi, Nandram and Kim (2017). The second approach involves two models, one for the sample, called the survey model, and the other for the population, called the census model. This approach is sometimes called the surrogate sampling approach; e.g., see Nandram (2007) and Nandram, Bhatta and Bhadra (2013). The surrogate sampling approach obtains a surrogate random sample from the census model, and then prediction is done via the census model. The third approach is based on finite population sampling, in which a sample distribution and a sample complement distribution are both constructed, see Sverchkov and Pfeffermann (2004). It is convenient to use this approach for line intersect sampling. Sverchkov and Pfeffermann (2004) developed design consistent predictors for the finite population total. Essentially they define the distributions of the sampled values and the nonsampled values as two separate weighted distributions of the census distribution (see Patil and Rao 1978). Yet another approach is based on quasi-likelihood (Chambers and Skinner 2003) which is difficult to perform in a Bayesian paradigm because the normalization constant is hard to evaluate (typically a complicated function of the model parameters).
Length-biased distributions are a special case of the more general form known as weighted distributions. First introduced by Fisher (1934) to model ascertainment bias, weighted distributions were later formalized in a unifying theory by Rao (1965); see also the celebrated paper of Patil and Rao (1978). Briefly, if the random variable has a probability density function (pdf) of , and a non-negative weight function of , then the corresponding weighted density function is
[TABLE]
A special case is when the weight function is . Such a distribution is known as a length-biased distribution and is given by
[TABLE]
where , subject to existence, is the expectation of . (In Bayesian statistics we do not use upper and lower cases to differentiate random variables and fixed quantities.)
Various works have been done to characterize relationships between the original distribution and the length-biased distribution. Muttlak and McDonald (1990) suggested using a ranked set sampling procedure to estimate the population size and population mean.
In this paper, we use a three-parameter generalized gamma distribution as the original distribution to model the widths of shrubs sampled by line intercept method. Line intercept method has been found in widespread applications when estimating particle density, coverage and yields. For example, Lucas and Seber (1977) and Eberhardt (1978) derived unbiased estimators for density and percentage cover for any spatial distribution and randomly located transects. McDonald (1980) showed that the Lucas and Seber (1977) estimators for density and percentage cover are unbiased for a simple random sample of unequal length transects. Shrubs can be collected from either randomly located or systematically located transects (Butler and McDonald, 1983). It is evident that shrubs with larger widths have higher probabilities of selection.
In Section 2, we provide the background of our study. This includes data description and a introduction of the three-parameter generalized gamma distribution, which allows us to robustify our Bayesian method to accommodate the length bias. Section 3 contains the model and the results. The procedure involves the following steps. First, we derive the population size distribution as well as the sample complement distribution by Bayes’ theorem. Next, we propose a random sampling method to generate random parameters from their joint posterior distribution. Then, using each set of parameter values, we obtain a set of samples and the corresponding complement samples (Sverchkov and Pfeffermann, 2004). Finally, one sample of the finite population mean can be obtained by taking the average of the pooled samples. The goodness of fit is checked by utilizing conditional predictive ordinates, with results compared both for the models with and without the length bias. The conclusion is made in section 4.
2. Data and Robustness
2.1. Description of the Data
The data we use were collected using the line intercept sampling method (Muttlak and McDonald 1990). The study was conducted in a limestone quarry dominated by regrowth of mountain mahogany. The study area was defined by the area east of the baseline and within the walls of the quarry, where the baseline was established approximately parallel to the fissures; see Figure 1 in Appendix A. By dividing the baseline into three equal parts, three systematically placed transects were established. To ensure uniform coverage over the study area, two independent replications, each with 3 transects were selected (see Figure 1). One quantity of interest is the mean width of the shrubs in the quarry. So the variable we study is the width of the projection of the shrub encountered by transects onto the baseline (for illustration see Figure 2).
We use the data from both replications, as shown in Appendix A Tables 1 and 2. The numbers of shrubs counted in three transects are respectively and in the two transects of Replicate 2 are respectively (one transect of Replicate 2 has no data). Looking at the box plots of these two replications (Figure 3), we notice the clear differences in the distributions among the three transects in Replication 1; whereas in Replication 2, the distributions are close. Therefore, when making inferences using Replication 1, we regard the data from these three transects as from three different strata (they are actually so), and distinguish them in our modeling.
One complication is that we do not know the number of shrubs in the entire quarry. As we intend to use the Bayesian approach, the data from Replicate 2 is used to construct a prior distribution of the finite population size and the data from Replicate 1 is to provide inference for the population mean.
2.2. Generalized Gamma Distribution
The generalized gamma distribution (GG) was first defined by Stacy (1962) and it encompasses various subfamilies including the Weibull distribution, the generalized normal distributions and the lognormal distribution as a limit. Khodabin and Ahmadabadi (2010) provided details of the subfamilies of generalized gamma distribution. Because of the flexibility of generalized gamma distribution, we use it as the underlying population distribution in our models.
Some authors have advocated the use of simpler models because of estimation difficulties caused by the complexity of GG parameter structure. For example, Parr and Webster (1965), Hager and Bain (1971), and Lawless (1980) have considered maximum likelihood estimation in the three-parameter generalized gamma distribution. They reported problems with iterative solution of the nonlinear equations implied by the maximum likelihood method. They remarked that maximum likelihood estimators might not exist unless the sample size exceeds 400. (Our sample sizes are much smaller so we need to be careful.) In our paper, we perform Bayesian analyzes of generalized gamma distribution to overcome this issue.
The probability density of generalized gamma distribution is given by
[TABLE]
where , are all positive. It is worth noting that the mean and variance of are given by
[TABLE]
respectively. We write to denote a random variable with pdf, defined by (2.1), which we call an unweighted generalized gamma distribution.
Note that when , we get the standard gamma distribution, and by making differ from many distributions are accommodated, thereby increasing the flexibility of the gamma distribution. It is in this sense we robustify our procedures.
The length-biased distribution of sample x is where is the expectation of from the unweighted density function . This can be easily derived as follows. Let denote the indicator variable, i.e., if the unit is selected and if the unit is not selected. Under length-biased sampling the probability that the unit has been selected given the value is , where is a constant. By Bayes’ theorem, the sample pdf is,
[TABLE]
For convenience we will write for .
Therefore, by using as the population distribution, the sample distribution is
[TABLE]
Note that is also a generalized gamma distribution with parameters , and , denote by , with mean and variance adjusted to
[TABLE]
We call it the weighted generalized gamma distribution.
3. Bayesian Methodology
In this section, we derive the population size distribution, the sample-complement distribution, as well as the posterior distribution for the parameters.
Denote as the number of transects, as the total number of shrubs in the transect, . Note that all the and are unknown. Prior information about is needed to carry out a full Bayesian analysis. Denote as the number of shrubs from the transect, is the number of samples. Let be the width of the sampled shrubs and be the widths for the nonsampled ones, which are to be predicted. The quantity of interest is
[TABLE]
where is the sample fraction and and are respectively the sample and nonsample means. A posteriori inference is required for . It is worth noting that is not a sufficient statistic and cannot be derived directly from the sample. Therefore, one needs to draw to predict .
In many studies the population size is unknown and it must be estimated before inference can be made about . Our application is no exception, however, this is easy to address as we have two sets of replicated samples. The second replicate (samples from the two transects are similarly distributed) can be used to construct a prior for . The first replicate (three transects need to be treated as three strata) is used to estimate the population mean shrub width. In this way “using the data twice” is avoided.
We assume that the population distributions for different strata are GG,
[TABLE]
accommodated the length bias, the sample distribution,
[TABLE]
The remaining problem is to find the distribution of the nonsampled values, , the so called the sample complement distribution (Sverchkov and Pfeffermann, 2004), which we will describe later.
In section 3.1 we show how to obtain the prior distribution for . In section 3.2 we describe the sample complement distribution. In section 3.3, we combine the results of 3.1 and 3.2 to derive the full Bayesian Model. In section 3.4, we study the posterior distributions in detail.
3.1. Prior Distribution of the Finite Population Size
We first find the estimate of based on the sample size. Then, estimates of can be obtained assuming proportional allocation.
The Horvitz-Thompson unbiased estimator of is
[TABLE]
where is the probability that the unit is selected; see Cochran (1977). Since the line intercept sampling gives the length biased data, we are actually sampling with probability proportion to width . Thus, we have
[TABLE]
where is a constant and , where (meters) is the length of the base line. Then, the estimated value of under selection bias is
[TABLE]
Using the data from the Replication 2, we have . (Note that Replication 1 has strata and Replication 2 has strata.) Then, using proportional allocation in Replication 1, as , , , we have , , .
Next, we assume
[TABLE]
does not depend on transects because of the nature of proportional allocation method.
Using independent noninformative priors for ,
[TABLE]
We derived the posterior distributions of ,
[TABLE]
which is a negative Binomial distribution with . By equating to , we solve for the estimated value of , which is .
Therefore, based on Replication 2 our data-based prior distributions of the are independently negative binomial distributions with parameters and .
3.2. Sample-Complement Distribution
Next, we need to make inference about the non-sampled values. That is, we obtain the sample complement distribution (Sverchkov and Pfeffermann, 2004) and draw samples from it. We consider a single transect first and drop the transect indicator .
Let if and if , where denotes the sample set. Then
[TABLE]
Thus, the posterior sample complement distribution given all the parameters is
[TABLE]
where is GG, is the expectation of , which is . We use the sampling importance re-sampling (SIR) algorithm to perform the sampling. The SIR algorithm is ideal because is a good proposal density and samples are easy to draw.
3.3. Full Bayesian Model
For the sample data our model is
[TABLE]
and the priori for , and are
[TABLE]
Note that the priors on the are improper and the priors on and are the distributions ( denotes the distribution with degrees of freedom being ), which are nearly noninformative (no moments exist) but proper.
The posterior sample complement distribution when incorporating all strata is
[TABLE]
where and are defined in the same way as (3.3).
The joint posterior density of \alpha,\mathchoice{\oalign{\displaystyle\beta\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\beta\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\beta\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\beta\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\gamma given \mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s}=\{x_{ij},j=1,\ldots,n_{i},i=1,\ldots,\ell\} is
[TABLE]
The posterior density can be simplified by transforming to . (Note that the jacobian of the transformation must be included.) Then, the joint posterior density of \alpha,\mathchoice{\oalign{\displaystyle\phi\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\phi\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\gamma given \mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s} is
[TABLE]
[TABLE]
This is not a standard posterior density, however, one can fit this model using Markov chain Monte Carlo methods (i.e., to obtain sample of \alpha,\mathchoice{\oalign{\displaystyle\phi\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\phi\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\gamma).
3.4. Further Study of the Posterior Density
One important problem we need to worry about is the posterior propriety of \pi(\alpha,\mathchoice{\oalign{\displaystyle\phi\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\phi\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\gamma\mid\mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s}). First, it is easy to see that
[TABLE]
Then, integrating out the , we get
[TABLE]
It is convenient to let and denote the arithmetic and geometric means of the . Then, we have
[TABLE]
Thus, we essentially have a two-parameter posterior density. Although an overkill, we attempted to fit this model using a Gibbs sampler. There are difficulties in performing the Gibbs sampler (perhaps associated with the difficulties encountered in finding MLEs in generalized gamma distribution) because high correlations are present among the parameters and thinning is not helpful. The problem is essentially high correlations between and . Thus, we consider an alternative algorithm which simply uses the multiplication rule of probability. We prove the theorem below which adds credence to our Bayesian methodology.
However, since makes the generalized gamma density a standard gamma density, it is sensible to bound in an interval centered at . That is, we take ; a sensible choice is or so. Thus, we replace the prior on by ; the original prior is inconvenient and not helpful.
Theorem
Assuming that , the joint posterior density of \pi(\alpha,\gamma\mid\mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s}) is proper.
Remark: Using the multiplication rule of probability,
[TABLE]
the theorem implies that \pi(\mathchoice{\oalign{\displaystyle\phi\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\phi\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\phi\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\alpha,\gamma\mid\mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s}) is also proper. Clearly, \pi(\mathchoice{\oalign{\displaystyle\beta\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle\beta\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle\beta\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle\beta\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}},\alpha,\gamma\mid\mathchoice{\oalign{\displaystyle x\crcr\vbox to0.86108pt{\hbox{\displaystyle\tilde{}}\vss}}}{\oalign{\textstyle x\crcr\vbox to0.86108pt{\hbox{\textstyle\tilde{}}\vss}}}{\oalign{\scriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptstyle\tilde{}}\vss}}}{\oalign{\scriptscriptstyle x\crcr\vbox to0.86108pt{\hbox{\scriptscriptstyle\tilde{}}\vss}}}_{s}) is also proper.
Proof
We make two observations. First, using the arithmetic-geometric inequality, we have . Second, using , we have , a function of , is bounded uniformly in . Therefore, we only need to show that
[TABLE]
Next, we transform to , keeping untransformed. Then, the integral becomes
[TABLE]
where
[TABLE]
We only need to show that
[TABLE]
is bounded uniformly in for any . For convenience, we will drop the subscript, , momentarily, so we simply need to show that , where is the logarithm of the gamma function, is uniformly bounded in ; see the Appendix A.
Finally, because , we are left with
[TABLE]
Therefore, our claim on propriety holds.
4. Bayesian Computations and Data Analyzes
In this section, we perform Bayesian analysis of the posterior distributions of population parameters by a numerical method, called random sampler, which performs better than the Gibbs sampler . We then obtain the nonsampled values using the sampling importance resampling (SIR) algorithm. Recall that the data we use here is Replication I, which has three transects, i.e., .
4.1. Random Sampler
Since the Gibbs sampler is a Markovian updating scheme, we have shown, in our work not presented in this paper, that most of the estimated values of population mean are larger than what we expected (see results in Appendix Table 3). One of the reasons is that high correlations among these parameters makes the Gibbs sampler inefficient in the sense it may take a very large number of iterations to converge in distribution. In this section, we propose a non-Markovian algorithm, called random sampler, in order to avoid the particular issue mentioned above.
Therefore, and cannot be sampled directly from their unbounded parameters space. We use the transformation and . Then,
[TABLE]
Two-dimensional grid method can be applied to draw and from their joint distribution. But grid method is computationally intensive in more than one dimension. We used the multiplication rule to draw samples of and .
[TABLE]
To apply this rule, we fist generated a sample of from , then generated a sample of from . Repeating this procedure times to obtain sets of and . The corresponding can also be obtained by sampling from .
The term in (4.1) is easy to derive.
[TABLE]
The term in (4.1) can be derived by integrating with respect to . Unfortunately, it is not possible to integrate by analytical techniques. For this reason, numerical methods have to be used. We use the 20-point Gaussian quadrature to approximate .
[TABLE]
where are the roots of orthogonal polynomials for and are the corresponding Gauss-Legender weights, which can be created by R package ’gaussquad’.
The summary of samples drawn by Random Sampler for each parameter and the population mean are shown in Table 6 . It is shown that the population mean has the IQR of , with the median of . In Fig. 4, it is shown that the posterior distribution of is bimodal (pointing to the difficulty in estimating ), while those of , , and are skewed to the right. In the next section, we will perform the model checking by conditional predictive ordinate (CPO).
4.2. Model Checking by Conditional Predictive Ordinate
Comparing the predictive distribution to the observed data is generally termed a “posterior predictive check”. This type of check includes the uncertainty associated with the estimated parameters of the model. Posterior predictive checks (via the predictive distribution) involve a double-use of the data, which causes predictive performance to be overestimated. To overcome this drawback, Geisser and Eddy (1979) has proposed the leave-one-out cross-validation predictive density. This is also known as the conditional predictive ordinate or CPO (Gelfand, 1996). The CPO is a handy posterior predictive check because it may be used to identify outliers, influential observations, and for hypothesis testing across different non-nested models. The CPO expresses the posterior probability of observing the value of when the model is fitted to all data except , with a larger value implying a better fit of the model to , and very low CPO values suggest that is an outlier and an influential observation. A Monte Carlo estimate of the CPO is obtained without actually omitting from the estimation, and is provided by the harmonic mean of the likelihood for . Specifically, the CPOi is the inverse of the posterior mean of the inverse likelihood of . The Monte Carlo estimate of CPO is
[TABLE]
where for ; see Molina, Nandram and Rao (2014).
The sum of the log(CPO)’s can be an estimator for the natural logarithm of the marginal likelihood, sometimes called the log pseudo marginal likelihood (LPML)
[TABLE]
Models with larger LPMLs are better. To compare the predictive distributions (both model with length bias and model without length bias) using our length-biased sample, we calculated the LPML for both models. The likelihood of under length biased model is given by
[TABLE]
where is the corresponding parameter for the stratum that is from. The likelihood of under no length biased model is given by
[TABLE]
where is the corresponding parameter for the stratum that is from. It is shown that the of for model with length bias is larger (LPML =) than the one for the model without length bias (LPML =), which means the model with length bias fits our length-biased sample better.
4.3. Nonsampled Widths
We define the importance function as
[TABLE]
Then, the importance ratios are
[TABLE]
A random sample can now be obtained by re-sampling with probability proportional to the ratios.
The algorithm to obtain the nonsampled values is as follows.
- •
Step 1. Obtain M sets of using the sampling methods described in the next Section.
- •
Step 2. Obtain a sample of from formula (3.2).
- •
Step 3. For each set of parameters, generate the vector where from the corresponding generalized gamma distribution.
- •
Step 4. Compute the population mean and the importance ratio .
- •
Step 5. Repeat the step 2 to step 4 times.
- •
Step 6. Draw values of the population mean with probability proportional to . We choose
5. Summary
In this paper we have presented a model for estimating population mean under length-biased sampling. We have used a weighted distribution of the three-parameter generalized gamma distribution to model the shrub widths to robustify our procedure that accommodates the length-biased sampling. Interest is on the finite population mean of shrub width in the entire quarry. In order to avoid certain technical issues associated with classical inference when using the generalized gamma distribution, we proposed a non-Markovian Bayesian numerical method, called random sampler, which performs better than Gibbs sampler when the population parameters are highly correlated. Posterior population distributions are easily estimated using this method. Conditional predictive ordinate shows that the model with length bias performs better than the model without length bias.
While accommodating transect sampling is a challenge, another important challenge in our procedure is to estimate the unknown population size. To ensure a full Bayesian procedure, we have used the data in Replication 2 (two strata) to estimate the finite population size that is usually unknown in this type of problem. The data from Replication 1 were used to estimate the average shrub width of the finite population. This estimate can, in turn, be used to give an estimate of the total shrub area assuming a standard geometry (e.g., a circle with the width being the diameter or a square with the width being length of a side).
An interesting topic for future research would be including covariates to study potential predictors. In Muttlak and McDonald (1990), in addition to the measurement of shrub widths, two more attributes of mountain mahogany, maximum height, and number of stems, were measured. Both attributes are important predictors of the average shrub width of an area’s vegetation. Semi-parametric linear regression (Chen, 2010) or generalized linear regression can be considered to measure this association. We can incorporate the covariates through a gamma type regression model. Let the covariates be and . Because the mean of each stratum is linearly related to respectively, we take . For the shrub data, our model is
[TABLE]
A similar form can be easily written down for the length-biased sampling. Our future plan is to fit a model to accommodate the covariates.
Acknowledgement**.**
*The author thanks the Associate Editor, XXX, and the referee for their comments which improved the quality of the paper very much. *
**Appendix A. Uniform Boundedness of
**
We need to show that
[TABLE]
is uniformly bounded in . We will show that asymptotes out horizontally.
First, differentiating , we have
[TABLE]
where is the diagamma function. Now, using the duplication property (Abramowitz and Stegun 1965, Ch. 6) of the digamma function, one can show that . That is, is monotonically increasing in ; see also Figure 5.
Second, differentiating , we have
[TABLE]
Using a theorem (Ronning 1986) which states that decreases monotonically in , we have . That is, is concave and the rate of increase of decreases; see Figure 6.
Therefore, asymptotes out horizontally and must be bounded; so is its exponent.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 22 Abramowitz, M. and Stegun, I. A. (1965), Handbook of Mathematics Functions , Dover Publications.
- 33 Butler, S. A. and Mc Donald, L. L. (1983), “ Unbiased systematic sampling plans for the line intercept method,” Journal of Range Management , 36, 463-468.
- 44 Chambers, R. L. and Skinner, C. J. (2003), Analysis of Survey Data , Wiley.
- 55 Chen, Y. Q. (2010), “Semiparametric regression in size-biased sampling, ” Biometrics , 66, 149-158.
- 66 Choi, S., Nandram, B. and Kim, D. (2017), “A hierarchical Bayesian model for binary data incorporating selection bias, ” Communications in Statistics: Simulation and Computation , 46 (6), 4767-4782.
- 77 Cochran, W. G. (1977), Sampling Techniques , Wiley, New York.
- 88 Eberhnrdt, L. L. (1978), “Transect methods for population studies, ” The Journal of Wildlife Management , 42, 1-31.
