The Early Roots of Statistical Learning in the Psychometric Literature: A review and two new results
Mark de Rooij, Bunga Citra Pratiwi, Marjolein Fokkema, Elise, Dusseldorp, Henk Kelderman

TL;DR
This paper reviews the historical roots of key statistical learning concepts in psychometric literature, introduces two new ideas about reliability and regularization, and provides empirical evidence on their effects.
Contribution
It uncovers early psychometric origins of cross-validation and regularization, and investigates the impact of reliability and regularization strategies on prediction accuracy.
Findings
Reliability has limited effect on predictive validity.
Regularization towards equal coefficients improves prediction error.
Historical psychometric methods influenced modern machine learning techniques.
Abstract
Machine and Statistical learning techniques become more and more important for the analysis of psychological data. Four core concepts of machine learning are the bias variance trade-off, cross-validation, regularization, and basis expansion. We present some early psychometric papers, from almost a century ago, that dealt with cross-validation and regularization. From this review it is safe to conclude that the origins of these lie partly in the field of psychometrics. From our historical review, two new ideas arose which we investigated further: The first is about the relationship between reliability and predictive validity; the second is whether optimal regression weights should be estimated by regularizing their values towards equality or shrinking their values towards zero. In a simulation study we show that the reliability of a test score does not influence the predictive validity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistics Education and Methodologies · Advanced Statistical Methods and Models · Data Analysis with R
