Setting the stage for data science: integration of data management skills in introductory and second courses in statistics
Nicholas J. Horton, Benjamin S. Baumer, Hadley Wickham

TL;DR
This paper advocates integrating data management, visualization, and reproducible analysis skills into introductory and second courses in statistics to better prepare students for data science and big data challenges.
Contribution
It introduces a curriculum approach that embeds data science tools early in statistics education to enhance students' practical data skills and statistical thinking.
Findings
Students gain practical data management skills
Enhanced understanding of statistical concepts through real-world data applications
Preparation for big data challenges in future statistical work
Abstract
Many have argued that statistics students need additional facility to express statistical computations. By introducing students to commonplace tools for data management, visualization, and reproducible analysis in data science and applying these to real-world scenarios, we prepare them to think statistically. In an era of increasingly big data, it is imperative that students develop data-related capacities, beginning with the introductory course. We believe that the integration of these precursors to data science into our curricula-early and often-will help statisticians be part of the dialogue regarding "Big Data" and "Big Questions".
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
