A few statistical principles for data science

Noel Cressie

arXiv:2102.01892·stat.OT·February 4, 2021

A few statistical principles for data science

Noel Cressie

PDF

TL;DR

This paper discusses nine statistical principles that guide data scientists in handling complex, interdisciplinary data analysis tasks amidst the evolving landscape of Data Science, emphasizing uncertainty quantification and methodological clarity.

Contribution

It introduces nine fundamental statistical principles tailored for data scientists to navigate complex interdisciplinary data analysis effectively.

Findings

01

Clarifies different approaches to uncertainty quantification.

02

Highlights the importance of statistical principles in complex data analysis.

03

Provides practical guidance for data scientists in interdisciplinary projects.

Abstract

In any other circumstance, it might make sense to define the extent of the terrain (Data Science) first, and then locate and describe the landmarks (Principles). But this data revolution we are experiencing defies a cadastral survey. Areas are continually being annexed into Data Science. For example, biometrics was traditionally statistics for agriculture in all its forms but now, in Data Science, it means the study of characteristics that can be used to identify an individual. Examples of non-intrusive measurements include height, weight, fingerprints, retina scan, voice, photograph/video (facial landmarks and facial expressions), and gait. A multivariate analysis of such data would be a complex project for a statistician, but a software engineer might appear to have no trouble with it at all. In any applied-statistics project, the statistician worries about uncertainty and quantifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.