Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix

Xuekui Zhang; Li Xing; Jing Zhang; Soojeong Kim

arXiv:2403.05647·stat.ME·January 9, 2026·1 cites

Minor Issues Escalated to Critical Levels in Large Samples: A Permutation-Based Fix

Xuekui Zhang, Li Xing, Jing Zhang, Soojeong Kim

PDF

Open Access 1 Repo

TL;DR

This paper highlights how large datasets can worsen minor issues into major errors in statistical analysis and introduces a permutation-based method to maintain error control across different sample sizes.

Contribution

It proposes a novel permutation-based test that stabilizes Type I error rates in large samples despite assumption violations.

Findings

01

Permutation test effectively controls Type I errors across sample sizes

02

Large samples can amplify minor assumption violations into significant errors

03

The method improves robustness of statistical inference in big data contexts

Abstract

In the big data era, the need to reevaluate traditional statistical methods is paramount due to the challenges posed by vast datasets. While larger samples theoretically enhance accuracy and hypothesis testing power without increasing false positives, practical concerns about inflated Type-I errors persist. The prevalent belief is that larger samples can uncover subtle effects, necessitating dual consideration of p-value and effect size. Yet, the reliability of p-values from large samples remains debated. This paper warns that larger samples can exacerbate minor issues into significant errors, leading to false conclusions. Through our simulation study, we demonstrate how growing sample sizes amplify issues arising from two commonly encountered violations of model assumptions in real-world data and lead to incorrect decisions. This underscores the need for vigilant analytical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ubcxzhang/bigdataissue
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment