Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix
Xuekui Zhang, Li Xing, Jing Zhang, Soojeong Kim

TL;DR
This paper highlights how large datasets can worsen minor issues into major errors in statistical analysis and introduces a permutation-based method to maintain error control across different sample sizes.
Contribution
It proposes a novel permutation-based test that stabilizes Type I error rates in large samples despite assumption violations.
Findings
Permutation test effectively controls Type I errors across sample sizes
Large samples can amplify minor assumption violations into significant errors
The method improves robustness of statistical inference in big data contexts
Abstract
In the big data era, the need to reevaluate traditional statistical methods is paramount due to the challenges posed by vast datasets. While larger samples theoretically enhance accuracy and hypothesis testing power without increasing false positives, practical concerns about inflated Type-I errors persist. The prevalent belief is that larger samples can uncover subtle effects, necessitating dual consideration of p-value and effect size. Yet, the reliability of p-values from large samples remains debated. This paper warns that larger samples can exacerbate minor issues into significant errors, leading to false conclusions. Through our simulation study, we demonstrate how growing sample sizes amplify issues arising from two commonly encountered violations of model assumptions in real-world data and lead to incorrect decisions. This underscores the need for vigilant analytical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvaluation and Performance Assessment
