Six textbook mistakes in data analysis

Alexandros Gezerlis; Martin Williams

arXiv:2209.09073·physics.data-an·January 13, 2023

Six textbook mistakes in data analysis

Alexandros Gezerlis, Martin Williams

PDF

Open Access

TL;DR

This paper identifies and corrects six common misconceptions in data analysis textbooks, emphasizing the importance of accurate statistical understanding for scientific and engineering data interpretation.

Contribution

It highlights widespread textbook errors in statistical methods and provides clear corrections, improving foundational understanding in data analysis education.

Findings

01

Six common textbook mistakes identified and corrected

02

Corrections applicable to a wide range of scientific and engineering data analysis

03

Enhances accuracy of statistical teaching and practice

Abstract

This article discusses a number of incorrect statements appearing in textbooks on data analysis, machine learning, or computational methods; the common theme in all these cases is the relevance and application of statistics to the study of scientific or engineering data; these mistakes are also quite prevalent in the research literature. Crucially, we do not address errors made by an individual author, focusing instead on mistakes that are widespread in the introductory literature. After some background on frequentist and Bayesian linear regression, we turn to our six paradigmatic cases, providing in each instance a specific example of the textbook mistake, pointers to the specialist literature where the topic is handled properly, along with a correction that summarizes the salient points. The mistakes (and corrections) are broadly relevant to any technical setting where statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistics Education and Methodologies · Multidisciplinary Science and Engineering Research · Machine Learning and Data Classification