Elicitation, measuring bias, checking for prior-data conflict and inference with a Dirichlet prior
Michael Evans, Irwin Guttman, Peiying Li

TL;DR
This paper develops methods for eliciting and assessing Dirichlet priors based on bounds, enabling bias measurement, prior-data conflict checking, and hypothesis evaluation in contingency table analysis.
Contribution
It introduces a novel approach to prior elicitation using bounds on probabilities and integrates bias assessment and conflict checking into Bayesian inference.
Findings
Effective prior elicitation based on bounds
Methods to detect prior-data conflict
Relative belief approach for hypothesis assessment
Abstract
Methods are developed for eliciting a Dirichlet prior based upon bounds on the individual probabilities that hold with virtual certainty. This approach to selecting a prior is applied to a contingency table problem where it is demonstrated how to assess the bias in the prior as well as how to check for prior-data conflict. It is shown that the assessment of a hypothesis via relative belief can easily take into account what it means for the falsity of the hypothesis to correspond to a difference of practical importance and provide evidence in favor of a hypothesis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
Elicitation, measuring bias, checking for prior-data conflict and inference with
a Dirichlet prior
Michael Evans, Irwin Guttman and Peiying Li
Department of Statistical Sciences
University of Toronto
Abstract: Methods are developed for eliciting a Dirichlet prior based upon bounds on the individual probabilities that hold with virtual certainty. This approach to selecting a prior is applied to a contingency table problem where it is demonstrated how to assess the bias in the prior as well as how to check for prior-data conflict. It is shown that the assessment of a hypothesis via relative belief can easily take into account what it means for the falsity of the hypothesis to correspond to a difference of practical importance and provide evidence in favor of a hypothesis.
Key words and phrases: elicitation, bias, relative belief inferences.
1 Introduction
Perhaps the most basic statistical model is the multinomial where and is the -dimensional simplex and is unknown. This arises from an i.i.d. sample from the multinomial distribution. The goal is then inference about the unknown value of
Bayesian inference requires a prior and the Dirichlet for some choice of hyperparameters is a convenient choice due to its conjugacy. To employ such a prior it is necessary to have an easy to use elicitation algorithm. The purpose of this paper is to develop such an algorithm, to show how the chosen prior can be assessed with respect to the bias that it induces, to check whether the prior conflicts with the data, to show how to modify the prior when such a conflict is encountered and to implement inferences using the prior based on a measure of statistical evidence.
In Section 2 an elicitation algorithm is developed for the Dirichlet. In Section 3 the bias in the prior is discussed and in Section 4 the issue of prior-data conflict and possible modification of the prior is addressed. Section 5 deals with inference for the multinomial based on the relative belief ratio as a measure of evidence. This presents a full treatment of a statistical analysis for the multinomial although it is assumed that the multinomial model is correct. Strictly speaking, provided the data is available, it should also be checked that the initial sample is i.i.d. from a multinomial distribution, perhaps using a multivariate version of a runs test, but this is not addressed here.
Throughout the paper the following example, taken from Snedecor and Cochran (1967), is considered as a practical application of the methodology.
Example 1. Assessing independence
Individuals were classified according to their blood type ( and , although the individuals were eliminated, as they were small in number) and also classified according to their disease status (peptic ulcer = , gastric cancer = , or control = ). So there are three populations; namely, those suffering from a peptic ulcer, those suffering from gastric cancer, and those suffering from neither and it is assumed that the individuals involved in the study can be considered as random samples from the respective populations. The data are in Table 1 and the goal is to determine whether or not and are independent. So the counts are assumed to be multinomial where the first index refers to and the second to and with a relabelling of the categories, e.g. is relabeled as
Using the chi-squared test, the null hypothesis of no relationship is rejected with a value of the chi-squared statistic of and a -value of . Table 2 gives the estimated cell probabilities based on the full multinomial as well as the estimated cell probabilities based on independence between the and The difference between the two tables is very small and of questionable practical significance. For example, the largest difference between corresponding cells is and, as a natural measure of difference between two distributions, the estimated Kullback-Leibler divergence, based on the raw data, is estimated as This suggests that in reality the deviation from independence is not meaningful. The cure for this is that, in assessing any hypothesis, it is necessary to say what size of deviation from the null is of practical significance and take this into account when performing the test. This arises as a natural aspect of the relative belief approach to this problem and will be discussed in Section 3, where a very different conclusion is reached in this example.
2 Elicitation
A key component of a Bayesian statistical analysis is the choice of the prior. For this it is recommended that an elicitation algorithm be used so that the selection of the prior be based upon what is known about problem under study. Typically this will involve some knowledge of what kind of values are expected for the data as these arise via some measurement process. In the context of the Dirichlet this knowledge will take the form of how likely a success is expected on each of the categories being counted. Of course, there can be a variety of elicitation algorithms that are appropriate. Our approach here is to develop one that is simple to use and results in an appropriate expression of belief. Discussions about the process of elicitation for general problems can be found in Gathwaite at al. (2005) and O’Hagan et al. (2006).
Consider first the situation where and the prior on is beta. Suppose it is known with ‘virtual certainty’ that where are known. This immediately implies that with virtual certainty. Here ‘virtual certainty’ is interpreted to mean that the true value of is in the interval with high prior probability , say So this restricts the prior to those values of satisfying To completely determine another condition is added, namely, it is required that the mode of the prior be at the point as this allows the placement of the primary amount of the prior mass at an appropriate place within For example, a natural choice of the mode in this context is , namely, the midpoint of the interval. When the mode of the beta occurs at where There is thus a 1-1 correspondence between the values and given by Therefore, after specifying the mode, only the scaling of the beta prior is required through the choice of The value is completely determined by provided that as it is easy to see that as Note that the restriction is natural as this avoids singularities at 0 or 1. If then the requirement can be relaxed to requiring satisfy so the beta suffices or a larger value of can be chosen.
Supposing it is then straightforward to solve for via an iterative algorithm. To start set which implies (\alpha_{1},\alpha_{2})=(1,1)\ and find such that and then proceed iteratively via the bisection root finding algorithm.
Example 2. Determining a beta prior.
Suppose that and The solution obtained via the iterative algorithm is then where the iteration is stopped when This took 7 iterations and the prior is given by and contains of the prior probability. If instead of the error tolerance for stopping was set equal to , then the solution and was obtained after 20 iterations with containing of the prior probability.
The approach to eliciting a beta prior seems very natural and allows for a great deal of flexibility in where the prior allocates the bulk of its mass in The question, however, is how to generalize this to the Dirichlet prior. As will be seen, it is necessary to be careful about how is elicited. Again we make the restriction that each to avoid singularities for the prior on the boundary.
It seems quite natural to think about putting probabilistic bounds on the such as requiring with high probability, for fixed constants to reflect what is known with ‘virtual certainty’ about For example, it may be known that is very small and so we put , choose small and require that with prior probability at least While placing bounds like this on the seems reasonable, such an approach can result in a complicated shape for the region that is to contain the true value of with virtual certainty. This complexity can make the computations associated with inference very difficult. In fact it can be hard to determine exactly what the full region is. As such, it seems better to use an elicitation method that fits well with the geometry of the Dirichlet family. If it is felt that more is known a priori than an Dirichlet prior can express, then it is appropriate to contemplate using some other family of priors. Given the conjugacy property of Dirichlet priors, which vastly simplifies many computations, the focus here is on devising elicitation algorithms that work well with this family. First we consider elicitation approaches for this problem that have been presented in the literature.
Chaloner and Duncan (1987) discuss an iterative elicitation algorithm based on specifying characteristics of the prior predictive distribution of the data which is Dirichlet-multinomial. Regazzini and Sazonov (1999) discuss an elicitation algorithm which entails partitioning the simplex, prescribing prior probabilities for each element of the partition and then selecting a mixture of Dirichlet distributions as the prior such that this prior has Prohorov distance less than some from the true prior associated with de Finetti’s representation theorem. Both of these approaches are complicated to implement. Closest to the method presented here is that discussed in Dorp and Mazzuchi (2003) where is specified by choosing stating two prior quantiles where for and specifying prior quantile for for each So there are constraints that the Dirichlet has to satisfy and an algorithm is provided for computing Drawbacks include the fact that the are not treated symmetrically as there is a need to place two constraints on one of the probabilities, is treated quite differently than the other probabilities, precise quantiles need to be specified and values can be obtained which induce singularities in the prior. Furthermore, it is not at all clear what these constraints say about the joint prior on as this elicitation does not take into account the dependencies that occur necessarily among the
A simpler approach to elicitation is now developed. There are several versions depending on whether lower or upper bounds are placed on the We start with the situation where a lower bound is given for each as this provides the basic idea for the others. Generally the elicitation process allows for a single lower or upper bound to be specified for each These bounds specify a subsimplex of the simplex with all edges of the same length. As will be seen, this implicitly takes into account the dependencies among the With such a region determined, it is straightforward to determine such that the subsimplex contains of the prior probability for
Note that a -dimensional simplex can be specified by specifying distinct points in say and then taking all convex combinations of these points. This simplex will be denoted as with So and it is clear that whenever The centroid of is equal to
2.1 Lower bounds on the probabilities
For this we ask for a set of lower bounds such that for To make sense there is only one additional constraint that the must satisfy, namely, If then it is immediate that otherwise So the are completely determined when Attention is thus restricted to the case where The following result then holds.
Theorem 1. Specifying the lower bounds such that for and
[TABLE]
prescribes where and
[TABLE]
The edges of each have length \sqrt{2}(1-L_{1:k})\and
Proof: Note that (1) implies that and so stating the lower bounds implies a set of upper bounds, and also Consider now the set and note that for For with then since, for example, the first coordinate satisfies so Therefore
If then where Now and so For we have This proves that and so we have
Finally note that and so has edges all of the same length. This completes the proof.
2.2 Upper bounds on the probabilities
Of course, it may be that prior beliefs are instead expressed via upper bounds on the probabilities or a mixture of upper and lower bounds. The case of all upper bounds is considered first. Our goal is to specify the upper bounds in such a way that these lead unambiguously to lower bounds satisfying (1) and so to the simplex
Suppose then that we have the upper bounds such that It is clear then that must satisfy the system of linear equations given by (2) as well as for and (1). So the must satisfy
[TABLE]
where is the -dimensional vector of 1’s and is the identity. Noting that it is immediate that
[TABLE]
Note that this requires that as is always the case.
Putting then (4) implies and so provided satisfies
[TABLE]
From (4)
[TABLE]
and, for this implies that iff
[TABLE]
Also, when (5) is satisfied, then for This completes the proof of the following result.
Theorem 2. Specifying upper bounds such that for satisfying inequalities (5) and (7), determines the lower bounds given by (6), which determine the simplex defined in Theorem 1.
The difficult aspect of this approach to elicitation is to make sure the upper bounds satisfy (5) and (7). If we take then (5) is satisfied and implies that (7) is satisfied as well.
2.3 Upper and lower bounds on the probabilities
Now, perhaps after relabelling the probabilities, suppose that lower bounds for as well as upper bounds for where have been provided. Again it is required that and we search for conditions on the that complete the prescription of a full set of lower bounds so that Theorem 1 applies. Again the and vectors must satisfy (3). Let denote the subvector of given by its consecutive -th through -th coordinates and the sum of these coordinates provided and be null otherwise. The following equations hold
[TABLE]
Rearranging these equations so the knowns are on the left and the unknowns are on the right gives
[TABLE]
It follows from (9) that
[TABLE]
and substituting this into (8) gives the solution for as well.
So it is only necessary to determine what additional conditions have to be imposed on the so that Theorem 1 applies. Note that it follows from (8) that takes the correct form, as given by (2), so it is really only necessary to check that is appropriate.
First it is noted that it is necessary that The case only occurs when and then which is the required value for for Theorem 1 to apply. So when there is no choice but to put and choose a lower bound for which of course could be 0, which means that Theorem 1 applies. It is assumed hereafter that
Now and the requirement imposes the requirement Using (10) gives
[TABLE]
and therefore iff
[TABLE]
It is seen that (11) generalizes (5) on taking Now for
[TABLE]
and so, for this implies that iff
[TABLE]
So (13) generalizes (5) on taking Also, if (11) is satisfied, then for
The above argument establishes the following result.
Theorem 3. For satisfying specifying the bounds
(i) with for satisfying and
(ii) with for satisfying (11) and (13),
determines the lower bounds given by (12), which, together with determine the simplex defined in Theorem 1.
2.4 Determining the Elicited Prior
So now suppose there is an elicited set of bounds that lead to the simplex specified by Theorem 1 and it is necessary to determine the Dirichlet prior, denoted such that Again we pick a point and place the mode at so for with For example, would often seem like a sensible choice and then only needs to be determined. There is a 1-1 correspondence between and given by
Again it makes sense to proceed via an iterative algorithm to determine . Provided set and find such that As before set and then the algorithm proceeds via bisection. Determining at each step becomes problematical even for . In the approach adopted here this probability content was estimated via a Monte Carlo sample from the relevant Dirichlet. This is seen to work quite well as, in the case of determining a prior, high accuracy for the computations is not required.
Consider an example.
Example 3. Determining a Dirichlet prior.
Suppose that and the lower bounds are placed on the probabilities. This results in the bounds and which are reasonably tight. The mode was placed at the centroid For an error tolerance of and a Monte Carlo sample of size of at each step, the values and were obtained after 13 iterations. The prior content of was estimated to be . If greater accuracy is required then can be increased and/or decreased.
This choice of lower bounds results in a fairly concentrated prior as is reflected in the plots of the marginals in Figure 1. This is reflected also in Figure 2 where scatter plots are provided of a sample of 300 from the joint distribution for the pairs of probabilities and . This concentration is not a defect of the elicitation as (2) indicates that it must occur when the sum of the bounds is close to 1. So the concentration is forced by the dependencies among the probabilities.
Consider now another example.
Example 4. Determining a Dirichlet prior.
Suppose that and the lower bounds are placed on the probabilities. This leads to the following bounds for the probabilities.
[TABLE]
The mode was placed at the centroid For an error tolerance of and a Monte Carlo sample of size of at each step, the values and
were obtained after 7 iterations. The prior content of was estimated to be Figure 3 is a plot of the 9 marginal priors for the Again the dependencies among the make the marginal priors quite concentrated.
Example 1. (continued) Choosing the prior.
Given that we wish to assess independence, it is necessary that any elicited prior include independence as a possibility so this is not ruled out a priori. A natural elicitation is to specify valid bounds (namely, bounds that satisfy our theorems) on the and the and then use these to obtain bounds on the which in turn leads to the prior. So suppose valid bounds have been specified that lead to the lower bounds Then it is necessary that is the lower bound on Note that it is immediate that the satisfy the conditions of Theorem 1 and from (2), which is greater than since and As such the region for the contains elements of
For this example, the lower bounds were chosen which leads to the lower bounds
[TABLE]
on the Note that these are precisely the bounds used in Example 4 so the prior is as determined in that example where the indexing is row-wise.
3 Measuring Bias in the Prior
Here we specialize the developments discussed in Evans (2015) to the multinomial problem with a Dirichlet prior. Suppose a quantity is of interest and there is a need to assess the hypothesis Let denote the prior density and denote the posterior density of where gives the observed cell counts. When then is the Dirichlet density and is the Dirichlet density. The relative belief ratio is defined as the limiting ratio of the posterior probability of a set containing to the prior probability of this set where the limit is taken as the set converges (nicely) to the point . Whenever and is continuous at then As such is measuring how beliefs about have changed from a priori to a posteriori and is a measure of evidence concerning If then there is evidence that is true, as belief in the truth of has increased, if then there is evidence that is false, as belief in the truth of has decreased and if then there is no evidence either way.
Given that there is a measure of evidence for , it is possible to assess the bias in the prior with respect to . For this let denote the prior predictive distribution of given that The bias against is assessed by
[TABLE]
the prior probability that evidence in favor of will not be obtained when is true. If (14) is large, then there is bias in the prior against and, as such, if evidence against is obtained after seeing the data, then this should have little impact. In essence the ingredients of the study are such that it is not meaningful to find evidence against To measure bias in favor of let be a value of that is just meaningfully different than In other words values that differ from less than does, are not considered as practically different than Then the bias in favor of is measured by
[TABLE]
If (15) is large, then there is bias in favor of H_{0}\and if evidence in favor of is obtained after seeing the data, then this should have little impact. It is shown in Evans (2015) that both (14) and (15) converge to 0 as So bias can be controlled by sample size.
The computation of (14) and (15) can be difficult in certain contexts with the primary issue being the need to generate from the conditional prior predictives of the data. As in the following example, however, great accuracy is typically not required for these computations and so effective methods are available.
Example 1. (continued) Measuring bias and choosing .
To assess independence between and the marginal parameter
[TABLE]
is used. Note that (16) is the minimum Kullback-Leibler distance between the values and an element of Furthermore, iff independence holds.
As discussed previously, it is necessary to specify a such that a practically meaningful lack of independence occurs iff the true value One approach is to specify a such that, if for all and then any such deviation is practically insignificant, as the relative errors are all bounded by Using for small this condition implies that The range of is then discretized using this and the hypothesis to be assessed is now, because always, This assessment is carried out using the relative belief ratios based on the discretized prior and posterior of as discussed in Section 5. For the data in this problem we take which corresponds to a relative error. So this says that we do not consider independence as failing when the true probabilities differ from probabilities based on independence with a relative error of less than 1%.
With this choice of the issue of bias is now addressed. The prior distribution of the discretized is determined by simulation. For this, generate the from the elicited prior and compute and the prior probability contents of the intervals for given by where is determined so as to cover the full range of observed generated values of The plot of the prior density histogram for is provided in Figure 4.
For inference the posterior contents of these intervals are also determined via simulating from the posterior based on the observed data. For measuring bias, however, we proceed as follows. Each time a generated satisfies the corresponding are used to generate a new data set and is determined and note that this requires generating from the posterior based on the The probability is then estimated by the proportion of these relative belief ratios that are less than or equal to 1. This gives an estimate of the bias against Estimating the bias in favor of proceeds similarly, but now the are generated whenever is satisfied, as these represent values that correspond to just differing from independence meaningfully.
Clearly this procedure could be computationally quite demanding if highly accurate estimates of the biases are required. In general, however, high accuracy is not necessary. Even accuracy to one decimal place will provide a clear indication of whether or not there is serious bias. In this problem the biases for the elicited prior are estimated to be for bias for and for bias against. So while there is some bias in favor of it is not serious and there is virtually no bias against These values depend on the chosen value of but in fact are reasonably robust to this choice. The prior probability content of the interval is while contains of the prior probability. So there is a reasonable amount of prior probability allocated to effective independence and also to the smallest nonindependence of interest.
4 Checking for Prior-Data Conflict
Anytime a prior is used it is reasonable to question whether or not the prior is contradicted by the data. For the elicitation could be in error, namely, what if the true probabilities lie well outside the intervals obtained. If the data demonstrate this in a reasonably conclusive way, then it would seem incorrect to proceed with an analysis based on this prior unless there was an absolute conviction that the amount of data was sufficient to overwhelm the influence of the prior. Such a situation is referred to as a prior-data conflict and methods exist to check whether or not this exists as well as methods to deal with it.
To check for prior-data conflict we follow Evans and Moshonov (2006) and compute the tail probability
[TABLE]
where is the observed value of the minimal sufficient statistic and is the prior predictive distribution of this statistic with density Evans and Jang (2011a) prove that quite generally (17) converges to as where is the prior on So (17) is indeed a valid check on the prior.
When the prior is given by the uniform, then a simple computation shows that (17) is equal to 1 and so there is no prior-data conflict. Intuitively, the closer is to [math], then the less information the prior is putting into the analysis. This idea can be made precise in terms of the weak informativity of one prior with respect to another as developed in Evans and Jang (2011b). As such, if prior-data conflict is obtained with the prior specified by a value of then this prior can be replaced by a prior that is weakly informative with respect to it so that the conflict can be avoided and this entails choosing a value
Example 1. (continued) Checking the elicited prior.
For the elicited Dirichlet prior the value of (17) is approximately equal to 1 (to the accuracy of the computations) and so there is definitely no prior-data conflict.
5 Inference
For data and Dirichlet prior the posterior, of is Dirichlet As such it is easy to generate from the posterior of , estimate the posterior contents of the intervals and then estimate the relative belief ratios From this a relative belief estimate of the discretized can be obtained and various hypotheses assessed for this quantity.
As discussed in Evans (2015) the strength of the evidence provided by is measured by
[TABLE]
namely, the posterior probability that the true value of has a relative belief ratio no greater than the hypothesized value. When so there is evidence against a small value for (18) implies there is strong evidence against since there is a large posterior probability that the true value has a larger relative belief ratio than When so there is evidence in favor of a large value for (18) indicates there is strong evidence in favor of since there is a small posterior probability that the true value has a larger relative belief ratio than Note that when then the best estimate of in the set is as it has the most evidence in its favor. Note that while the measure of strength looks like a -value, it has a very different interpretation and it is not measuring evidence.
Given that there is no prior-data conflict with the elicited prior and little or no bias in this prior relative to the hypothesis of independence, we can proceed to inference.
Example 1. (continued) Inference.
The posterior of the is the Dirichlet
distribution. For the hypothesis of independence between the variables, and using the discretized Kullback-Leibler divergence with the value was obtained so there is evidence in favor of For the strength of this evidence the value of (18) equals So the evidence in favor of is of the maximum possible strength. Of course, this is due to the large sample size and the fact that the posterior distribution concentrates entirely in Note that is a very different conclusion than that obtained by the -value based on the chi-squared test.
6 Conclusions
A very natural and easy to use method has been developed for eliciting Dirichlet priors based upon placing single bounds on the individual probabilities that takes into account the dependencies among the probabilities. Of course, there may be more information available, such as upper and lower bounds on many of the probabilities. The price paid for this, however, is a much more complicated region where the bulk of the prior mass is located and even difficulties in determining what that region is. So indeed further research into the development of elicitation algorithms for this family of priors is warranted.
The application of this prior to an inference problem has also been illustrated using a measure of statistical evidence, the relative belief ratio, as a basis for the inferences. Given that a measure of evidence has been identified, it is possible to assess the bias in the prior before proceeding to inference. Also, the prior has been checked to see if it is contradicted by the data. Finally, it is seen that the assessment of a hypothesis can be different than that obtained by a standard -value and, in particular, provide evidence in favor of a hypothesis. Of course, this is based on a well-known defect in -values, namely, with a large enough sample a failure of the hypothesis of no practical importance can be detected. The solution to this problem is to say what difference matters and use an approach that incorporates this. Relative belief inferences are seen to do this in a very natural way. The choice of is not arbitrary but is rather a fundamental characteristic of the application. When such a can’t be determined it is not a failure of the inference methodology, but rather reflects a failure of the analyst to understand an aspect of the application that is necessary for a more refined analysis to take place.
7 References
Chaloner, K. and Duncan, G.T. (1987). Some properties of the Dirichlet multinomial distribution and its use in prior elicitation. Communications in Statistics – Theory and Methods, 16, 511–523.
Dickey, J. M., Jiang, J. M., and Kadane, J. B. (1987). Bayesian methods for censored categorical data. Journal of the American Statistical Association, 82, 773–781.
Dorp, J, and Mazzuchi, T. A. (2003) Parameter specification of the beta distribution and its Dirichlet extensions utilizing quantiles. Handbook of Beta Distributions and Its Applications, eds. Gupta, A. K. and Nadarajah, 3-32, S. Marcel Dekker Inc.
Evans, M. and Moshonov, H. (2006) Checking for prior-data conflict. Bayesian Analysis, 1, 4, 893-914.
Evans, M. (2015) Measuring Statistical Evidence Using Relative Belief. Monographs on Statistics and Applied Probability 144, CRC Press.
Evans, M. and Jang, G-H. (2011a) A limit result for the prior predictive applied to checking for prior-data conflict. Statistics and Probability Letters, 81, 1034-1038.
Evans, M. and Jang, G-H. (2011b). Weak informativity and the information in one prior relative to another. Statistical Science, 26, 3, 423-439.
Garthwaite, P. H., Kadane, J. B., and O’Hagan, A. (2005) Statistical methods for eliciting probability distributions. Journal of the American Statistical Association, 100, 470, 680-700.
O’Hagan, A., Buck C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., Oakley, J. E., Rakow, T. (2006) Uncertain Judgements: Eliciting Experts’ Probabilities. John Wiley & Sons.
Regazzini, E. and Sazonov, V.V. (1999). Approximation of laws of multinomial parameters by mixtures of Dirichlet distributions with applications to Bayesian inference. Acta Applicandae Mathematicae, 58, 247–264.
Snedecor, G. and Cochran, W. (1967) *Statistical Methods, 6th ed., *Iowa State University Press.
