Homogeneity of multinomial populations when data are classified into a large number of groups
M.V. Alba-Fern\'andez, M.D. Jim\'enez--Gamero, F.J. Ariza-L\'opez

TL;DR
This paper introduces a new asymptotic normality-based test for comparing categorical variable distributions across multiple groups, useful when data are classified into many groups, with demonstrated effectiveness through simulations and real data examples.
Contribution
A novel testing procedure for homogeneity across multiple groups that does not rely on resampling, applicable as the number of groups increases, with proven asymptotic properties.
Findings
Test statistic is asymptotically normal as group number increases
Method performs well in finite samples based on simulations
Applied successfully to datasets from education, COVID-19, and elevation models
Abstract
Suppose that we are interested in the comparison of two independent categorical variables. Suppose also that the population is divided into subpopulations or groups. Notice that the distribution of the target variable may vary across subpopulations, moreover, it may happen that the two independent variables have the same distribution in the whole population, but their distributions could differ in some groups. So, instead of testing the homogeneity of the two categorical variables, one may be interested in simultaneously testing the homogeneity in all groups. A novel procedure is proposed for carrying out such a testing problem. The test statistic is shown to be asymptotically normal, avoiding the use of complicated resampling methods to get -values. Here by asymptotic we mean when the number of groups increases; the sample sizes of the data from each group can either stay bounded or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
