Data Science vs. Statistics: Two Cultures?
Iain Carmichael, J.S. Marron

TL;DR
This paper explores the relationship between data science and statistics, emphasizing their shared origins, evolving practices, and implications for the future of statistical education, communication, and research.
Contribution
It advocates for a broad, inclusive view of data analysis, integrating modern approaches like machine learning and computation with traditional statistics.
Findings
Data science broadens traditional statistics with new methods.
Evolving data analysis practices impact statistical education.
Future directions include improved communication and interdisciplinary research.
Abstract
Data science is the business of learning from data, which is traditionally the business of statistics. Data science, however, is often understood as a broader, task-driven and computationally-oriented version of statistics. Both the term data science and the broader idea it conveys have origins in statistics and are a reaction to a narrower view of data analysis. Expanding upon the views of a number of statisticians, this paper encourages a big-tent view of data analysis. We examine how evolving approaches to modern data analysis relate to the existing discipline of statistics (e.g. exploratory analysis, machine learning, reproducibility, computation, communication and the role of theory). Finally, we discuss what these trends mean for the future of statistics by highlighting promising directions for communication, education and research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
