A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects
Andrew J. Simmons, Scott Barnett, Jessica Rivera-Villicana, Akshat, Bajaj, Rajesh Vasa

TL;DR
This study compares coding standard adherence between Data Science and traditional software projects, revealing significant differences in coding practices and suggesting that conventional standards may not suit Data Science codebases.
Contribution
It provides the first large-scale empirical analysis of coding standards in Data Science projects, highlighting key differences from traditional software engineering practices.
Findings
Data Science projects have more functions with excessive parameters and local variables.
Different variable naming conventions are used in Data Science projects.
Data Science codebases deviate from traditional software engineering standards.
Abstract
Background: Meeting the growing industry demand for Data Science requires cross-disciplinary teams that can translate machine learning research into production-ready code. Software engineering teams value adherence to coding standards as an indication of code readability, maintainability, and developer expertise. However, there are no large-scale empirical studies of coding standards focused specifically on Data Science projects. Aims: This study investigates the extent to which Data Science projects follow code standards. In particular, which standards are followed, which are ignored, and how does this differ to traditional software projects? Method: We compare a corpus of 1048 Open-Source Data Science projects to a reference group of 1099 non-Data Science projects with a similar level of quality and maturity. Results: Data Science projects suffer from a significantly higher rate of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
