NLP-Guided Synthesis: Transitioning from Sequential Programs to Distributed Programs
Arun Sanjel, Bikram Khanal, Greg Speegle, Pablo Rivas

TL;DR
This paper presents ROOP, an NLP-based tool that automates converting sequential Python code into distributed PySpark code with high accuracy and efficiency, simplifying large-scale data processing for developers.
Contribution
ROOP is the first tool to use a BERT-based NLP model for automated Python to PySpark code translation, achieving high accuracy and fast translation times.
Findings
ROOP translated 25 out of 26 loop fragments correctly.
Simple operations were translated in as little as 44 seconds.
ROOP includes a testing mechanism to verify code equivalence.
Abstract
As the need for large-scale data processing grows, distributed programming frameworks like PySpark have become increasingly popular. However, the task of converting traditional, sequential code to distributed code remains a significant hurdle, often requiring specialized knowledge and substantial time investment. While existing tools have made strides in automating this conversion, they often fall short in terms of speed, flexibility, and overall applicability. In this paper, we introduce ROOP, a groundbreaking tool designed to address these challenges. Utilizing a BERT-based Natural Language Processing (NLP) model, ROOP automates the translation of Python code to its PySpark equivalent, offering a streamlined solution for leveraging distributed computing resources. We evaluated ROOP using a diverse set of 14 Python programs comprising 26 loop fragments. Our results are promising: ROOP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Advanced Software Engineering Methodologies · Service-Oriented Architecture and Web Services
