This is not normal! (Re-) Evaluating the lower $n$ guidelines for   regression analysis

David Randahl

arXiv:2409.06413·stat.ME·October 17, 2024

This is not normal! (Re-) Evaluating the lower $n$ guidelines for regression analysis

David Randahl

PDF

Open Access

TL;DR

This study challenges the traditional rule of thumb that a sample size of 30 is necessary for valid regression inferences, showing that distributional characteristics significantly influence the required sample size for convergence.

Contribution

The paper provides new, simulation-based guidelines indicating that smaller sample sizes may suffice under certain distributional conditions, revising the conventional $n \\geq 30$ rule.

Findings

01

Symmetric or platykurtic variables allow smaller sample sizes for convergence.

02

Highly skewed variables require larger sample sizes for reliable t-values.

03

The traditional $n \\geq 30$ rule is overly conservative or insufficient depending on distribution.

Abstract

The commonly cited rule of thumb for regression analysis, which suggests that a sample size of $n \geq 30$ is sufficient to ensure valid inferences, is frequently referenced but rarely scrutinized. This research note evaluates the lower bound for the number of observations required for regression analysis by exploring how different distributional characteristics, such as skewness and kurtosis, influence the convergence of t-values to the t-distribution in linear regression models. Through an extensive simulation study involving over 22 billion regression models, this paper examines a range of symmetric, platykurtic, and skewed distributions, testing sample sizes from 4 to 10,000. The results show that it is sufficient that either the dependent or independent variable follow a symmetric distribution for the t-values to converge at much smaller sample sizes than $n = 30$ , unless the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference