BUGSPHP: A dataset for Automated Program Repair in PHP
K.D. Pramod, W.T.N. De Silva, W.U.K. Thabrew, Ridwan Shariffdeen,, Sandareka Wickramanayake

TL;DR
BUGSPHP is a new benchmark dataset of real-world PHP bugs, enabling research in automated program repair for PHP, a widely used but underexplored language in this domain.
Contribution
The paper introduces BUGSPHP, the first comprehensive PHP bug dataset with over 600,000 commits and 513 validated bug fixes, facilitating APR research in PHP.
Findings
Dataset includes 600,000+ bug-fixing commits from GitHub.
Contains 513 manually validated bug-fixing commits.
Enables analysis and testing of APR techniques for PHP.
Abstract
Automated Program Repair (APR) improves developer productivity by saving debugging and bug-fixing time. While APR has been extensively explored for C/C++ and Java programs, there is little research on bugs in PHP programs due to the lack of a benchmark PHP bug dataset. This is surprising given that PHP has been one of the most widely used server-side languages for over two decades, being used in a variety of contexts such as e-commerce, social networking, and content management. This paper presents a benchmark dataset of PHP bugs on real-world applications called BUGSPHP, which can enable research on analysis, testing, and repair for PHP programs. The dataset consists of training and test datasets, separately curated from GitHub and processed locally. The training dataset includes more than 600,000 bug-fixing commits. The test dataset contains 513 manually validated bug-fixing commits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Cloud Computing and Resource Management
