Characterizing Data Scientists in the Real World
Paula Pereira, J\'acome Cunha, Jo\~ao P. Fernandes

TL;DR
This paper surveys data scientists to understand their skills, tools, and practices, aiming to better support their productivity and address potential training gaps in the field.
Contribution
It provides a comprehensive characterization of current data scientists through a public survey, highlighting their skills, tools, and work practices.
Findings
Data scientists have diverse skill sets and backgrounds.
There are gaps in skills and training among data scientists.
Tools used vary widely across different sectors.
Abstract
Data collection is pervasively bound to our digital lifestyle. A recent study by the IDC reports that the growth of the data created and replicated in 2020 was even higher than in the previous years due to pandemic-related confinements to an astonishing global amount of 64.2 zettabytes of data. While not all the produced data is meant to be analyzed, there are numerous companies whose services/products rely heavily on data analysis. That is to say that mining the produced data has already revealed great value for businesses in different sectors. But to be able to fully realize this value, companies need to be able to hire professionals that are capable of gleaning insights and extracting value from the available data. We hypothesize that people nowadays conducting data-science-related tasks in practice may not have adequate training or formation. So in order to be able to fully support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
