A Data Set of Generalizable Python Code Change Patterns
Akalanka Galappaththi, Sarah Nadi

TL;DR
This paper introduces a curated dataset of 72 manually verified, generalizable Python code change patterns mined from version control history, aimed at improving automated tooling and developer support tools.
Contribution
The paper presents a new dataset of Python change patterns with a manual review process, filling a gap in available resources for Python code change analysis.
Findings
72 verified Python change patterns identified
Patterns are applicable across multiple projects
Dataset supports development of automated tooling
Abstract
Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Computational Physics and Python Applications
