Two-Sample Test for Sparse High Dimensional Multinomial Distributions
Amanda Plunkett, Junyong Park

TL;DR
This paper introduces a new statistical test for comparing two high-dimensional multinomial distributions, addressing limitations of traditional methods in sparse data scenarios, and demonstrates its effectiveness through theoretical analysis and empirical studies.
Contribution
A novel test for high-dimensional multinomial equality that remains effective with sparse data, with proven asymptotic properties and practical validation.
Findings
The new test is asymptotically normal.
It has higher power in sparse settings.
Numerical studies confirm improved performance.
Abstract
In this paper we consider testing the equality of probability vectors of two independent multinomial distributions in high dimension. The classical chi-square test may have some drawbacks in this case since many of cell counts may be zero or may not be large enough. We propose a new test and show its asymptotic normality and the asymptotic power function. Based on the asymptotic power function, we present an application of our result to neighborhood type test which has been previously studied, especially for the case of fairly small -values. To compare the proposed test with existing tests, we provide numerical studies including simulations and real data examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
