SOAR: A Synthesis Approach for Data Science API Refactoring
Ansong Ni, Daniel Ramos, Aidan Yang, In\^es Lynce, Vasco Manquinho,, Ruben Martins, and Claire Le Goues

TL;DR
SOAR is a novel, training-data-free approach for automatic API refactoring in data science libraries, leveraging documentation and program synthesis to efficiently migrate APIs across versions and libraries.
Contribution
It introduces a new synthesis-based method that requires no training data, unlike prior statistical learning approaches, to automate API refactoring and migration.
Findings
Successfully refactors 80% of deep learning benchmarks
Achieves 90% success on data wrangling benchmarks
Operates with an average runtime of under 2 minutes
Abstract
With the growth of the open-source data science community, both the number of data science libraries and the number of versions for the same library are increasing rapidly. To match the evolving APIs from those libraries, open-source organizations often have to exert manual effort to refactor the APIs used in the code base. Moreover, due to the abundance of similar open-source libraries, data scientists working on a certain application may have an abundance of libraries to choose, maintain and migrate between. The manual refactoring between APIs is a tedious and error-prone task. Although recent research efforts were made on performing automatic API refactoring between different languages, previous work relies on statistical learning with collected pairwise training data for the API matching and migration. Using large statistical data for refactoring is not ideal because such training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software System Performance and Reliability
