Unboxing Default Argument Breaking Changes in 1 + 2 Data Science Libraries
Jo\~ao Eduardo Montandon, Luciana Lourdes Silva, Cristiano Politowski,, Daniel Prates, Arthur de Brito Bonif\'acio, Ghizlane El Boussaidi

TL;DR
This paper investigates Default Argument Breaking Changes (DABCs) in popular Data Science libraries, revealing their prevalence, impact on client applications, and discussing strategies for managing these changes to improve API stability.
Contribution
It identifies and analyzes 93 DABCs across three major Python libraries, quantifies their impact on over 500,000 applications, and offers insights for mitigating their effects.
Findings
35% of Scikit Learn clients affected by DABCs
Only 0.13% of NumPy clients impacted
DABCs often change function behavior, affecting API stability
Abstract
Data Science (DS) has become a cornerstone for modern software, enabling data-driven decisions to improve companies services. Following modern software development practices, data scientists use third-party libraries to support their tasks. As the APIs provided by these tools often require an extensive list of arguments to be set up, data scientists rely on default values to simplify their usage. It turns out that these default values can change over time, leading to a specific type of breaking change, defined as Default Argument Breaking Change (DABC). This work reveals 93 DABCs in three Python libraries frequently used in Data Science tasks -- Scikit Learn, NumPy, and Pandas -- studying their potential impact on more than 500K client applications. We find out that the occurrence of DABCs varies significantly depending on the library; 35% of Scikit Learn clients are affected, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
