Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research
Sebastian Calonico, Sebastian Galiani

TL;DR
This paper reviews hierarchical multiple testing methods that improve statistical power and interpretability in empirical research by incorporating logical or causal relationships among hypotheses, beyond traditional Bonferroni corrections.
Contribution
It provides a comprehensive review and comparison of hierarchical testing procedures, highlighting their advantages over standard methods in structured hypothesis testing.
Findings
Hierarchical methods increase power compared to Bonferroni.
They maintain error control under specific dependence structures.
Hierarchical procedures improve interpretability of results.
Abstract
Empirical research in the social and medical sciences frequently involves testing multiple hypotheses simultaneously, increasing the risk of false positives due to chance. Classical multiple testing procedures, such as the Bonferroni correction, control the family-wise error rate (FWER) but tend to be overly conservative, reducing statistical power. Stepwise alternatives like the Holm and Hochberg procedures offer improved power while maintaining error control under certain dependence structures. However, these standard approaches typically ignore hierarchical relationships among hypotheses -- structures that are common in settings such as clinical trials and program evaluations, where outcomes are often logically or causally linked. Hierarchical multiple testing procedures -- including fixed sequence, fallback, and gatekeeping methods -- explicitly incorporate these relationships,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
