Exploring Security Commits in Python
Shiyu Sun, Shu Wang, Xinda Wang, Yunlong Xing, Elisa Zhang, Kun Sun

TL;DR
This paper introduces PySecDB, the first comprehensive dataset for Python security commits, along with novel graph-based models to detect hidden security fixes, enhancing security analysis and maintenance.
Contribution
It constructs PySecDB, a new dataset for Python security commits, and proposes SCOPY, a graph learning model for identifying security-related code changes.
Findings
PySecDB contains 1,258 security commits verified by experts.
SCOPY improves security commit detection efficiency by up to 40%.
Four common security fix patterns cover over 85% of security commits.
Abstract
Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by 'silent' security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Computational Physics and Python Applications
