Comments on the Neyman-Fisher Controversy and Its Consequences
Arman Sabbaghi, Donald B. Rubin

TL;DR
This paper critically examines the historical Neyman-Fisher controversy, clarifying misconceptions about the validity of F-tests in experimental designs and highlighting its negative influence on the development of statistical theory.
Contribution
It provides a detailed analysis of the controversy, correcting Neyman's misconceptions, and discusses its impact on the evolution of statistical methods and the neglect of potential outcomes.
Findings
Neyman's expressions for expected mean sums of squares are generally incorrect.
The belief that Type I error increases when the expected treatment sum of squares exceeds residuals is incorrect.
The controversy hindered the development of potential outcomes in statistical theory.
Abstract
The Neyman-Fisher controversy considered here originated with the 1935 presentation of Jerzy Neyman's Statistical Problems in Agricultural Experimentation to the Royal Statistical Society. Neyman asserted that the standard ANOVA F-test for randomized complete block designs is valid, whereas the analogous test for Latin squares is invalid in the sense of detecting differentiation among the treatments, when none existed on average, more often than desired (i.e., having a higher Type I error than advertised). However, Neyman's expressions for the expected mean residual sum of squares, for both designs, are generally incorrect. Furthermore, Neyman's belief that the Type I error (when testing the null hypothesis of zero average treatment effects) is higher than desired, whenever the expected mean treatment sum of squares is greater than the expected mean residual sum of squares, is generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
