The evidence contained in the P-value is context dependent
Florian Hartig, Fr\'ed\'eric Barraquand

TL;DR
This paper discusses the context-dependent nature of p-values in statistical testing, emphasizing that p-values should be interpreted with other indicators and criticizing the idea of viewing them as a gradual measure of evidence.
Contribution
It critically examines the interpretation of p-values as a measure of evidence, advocating for context-aware interpretation rather than fixed thresholds or gradual evidence scales.
Findings
P-values are context-dependent and should be interpreted with effect sizes.
Dichotomizing p-values leads to information loss.
Interpreting p-values as evidence levels is problematic.
Abstract
In a recent opinion article, Muff et al. recapitulate well-known objections to the Neyman-Pearson Null-Hypothesis Significance Testing (NHST) framework and call for reforming our practices in statistical reporting. We agree with them on several important points: the significance threshold P<0.05 is only a convention, chosen as a compromise between type I and II error rates; transforming the p-value into a dichotomous statement leads to a loss of information; and p-values should be interpreted together with other statistical indicators, in particular effect sizes and their uncertainty. In our view, a lot of progress in reporting results can already be achieved by keeping these three points in mind. We were surprised and worried, however, by Muff et al.'s suggestion to interpret the p-value as a "gradual notion of evidence". Muff et al. recommend, for example, that a P-value > 0.1 should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
