Statistical methods for linguistic research: Foundational Ideas - Part I
Shravan Vasishth, Bruno Nicenboim

TL;DR
This paper explains fundamental statistical hypothesis testing concepts within the frequentist framework, emphasizing proper study design, interpretation, and replication to improve linguistic research validity.
Contribution
It provides a clear, detailed explanation of hypothesis testing principles and addresses common pitfalls in linguistic experiments, promoting rigorous statistical practices.
Findings
Importance of adequately powered studies
Misconceptions about p-values clarified
Recommendations for best practices in linguistic research
Abstract
We present the fundamental ideas underlying statistical hypothesis testing using the frequentist framework. We begin with a simple example that builds up the one-sample t-test from the beginning, explaining important concepts such as the sampling distribution of the sample mean, and the iid assumption. Then we examine the p-value in detail, and discuss several important misconceptions about what a p-value does and does not tell us. This leads to a discussion of Type I, II error and power, and Type S and M error. An important conclusion from this discussion is that one should aim to carry out appropriately powered studies. Next, we discuss two common issues we have encountered in psycholinguistics and linguistics: running experiments until significance is reached, and the "garden-of-forking-paths" problem discussed by Gelman and others, whereby the researcher attempts to find statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
