Galton's Family Heights Data Revisited
Hao Han, Yeming Ma, Wei Zhu

TL;DR
This paper revisits Galton's family heights data to critically evaluate the classic regression model and introduces a new robust regression method, revealing biases and limitations in traditional approaches.
Contribution
The study challenges the validity of Galton's original model and demonstrates the effectiveness of a new nonparametric regression method as a benchmark.
Findings
Traditional regression models exhibit biases and limitations.
Galton's model may misrepresent relationships between parent and child heights.
The new robust regression approach provides more reliable insights.
Abstract
Galton's family heights data has been a preeminent historical dataset in regression analysis, on which the original model and basic results have survived the close scrutiny of statisticians for 125 years. However by revisiting Galton's family data, we challenge whether Galton's classic model and his regression towards mean interpretation are proper. Using Galton's data as a benchmark for different regression methods, such as least squares, orthogonal regression, geometric mean regression, and least sine squares regression - a newly developed nonparametric robust regression approach, we elucidate that his regression model has fundamental drawbacks not only in variable and model selection by "transmuting" women into men thus the simple linear model, but also a strong bias in least squares regression leading to otherwise alternative conclusions on the true relationships between the heights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Economics of Agriculture and Food Markets · Genetic and phenotypic traits in livestock
