# A First Course in Data Science

**Authors:** Donghui Yan, Gary E. Davis

arXiv: 1905.03121 · 2020-05-27

## TL;DR

This paper discusses the design and structure of an undergraduate first course in data science, emphasizing the data science life cycle and practical tools to analyze real data for meaningful insights.

## Contribution

It introduces a principled, practical course framework centered on the data science life cycle, tailored for undergraduate education and industry relevance.

## Key findings

- Course design based on data science life cycle
- Integration of industry-relevant tools and techniques
- Focus on real data analysis and motivated questions

## Abstract

Data science is a discipline that provides principles, methodology and guidelines for the analysis of data for tools, values, or insights. Driven by a huge workforce demand, many academic institutions have started to offer degrees in data science, with many at the graduate, and a few at the undergraduate level. Curricula may differ at different institutions, because of varying levels of faculty expertise, and different disciplines (such as Math, computer science, and business etc) in developing the curriculum. The University of Massachusetts Dartmouth started offering degree programs in data science from Fall 2015, at both the undergraduate and the graduate level. Quite a few articles have been published that deal with graduate data science courses, much less so dealing with undergraduate ones. Our discussion will focus on undergraduate course structure and function, and specifically, a first course in data science. Our design of this course centers around a concept called the data science life cycle. That is, we view tasks or steps in the practice of data science as forming a process, consisting of states that indicate how it comes into life, how different tasks in data science depend on or interact with others until the birth of a data product or the reach of a conclusion. Naturally, different pieces of the data science life cycle then form individual parts of the course. Details of each piece are filled up by concepts, techniques, or skills that are popular in industry. Consequently, the design of our course is both "principled" and practical. A significant feature of our course philosophy is that, in line with activity theory, the course is based on the use of tools to transform real data in order to answer strongly motivated questions related to the data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.03121/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1905.03121/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/1905.03121/full.md

---
Source: https://tomesphere.com/paper/1905.03121