BioDefect: The First Dataset for Defect Detection in Bioinformatics Software
Tianxiang Xu, Xiaoyan Zhu, Xin Lai, Xin Lian, Hangyu Cheng, Jiayin Wang

TL;DR
BioDefect is a novel dataset specifically designed for defect detection in bioinformatics software, including complete source code and improving detection performance significantly.
Contribution
Introduction of BioDefect, the first dataset tailored for bioinformatics software defect detection, addressing limitations of existing datasets and enhancing model performance.
Findings
BioDefect improves defect detection F1-score by 29.61% to 38.04%.
BioDefect includes complete source code repositories for realistic defect scenarios.
Systematic assessment on nine language models demonstrates BioDefect's effectiveness.
Abstract
Software defect detection is a critical task in software engineering. However, no prior studies have specifically addressed defect detection in bioinformatics software. Given that the performance of defect detection tasks is primarily influenced by both models and datasets, our experiments controlled for model-related factors and confirmed the limitations of existing datasets in bioinformatics software. To address this issue, we introduce BioDefect, the first dataset specifically designed for defect detection in bioinformatics software, aiming to overcome the limitations of existing datasets in this context. Unlike prior datasets, BioDefect includes complete source code repositories, preserving the actual contextual information of defective code, thereby more accurately reflecting real-world defect scenarios in bioinformatics software. Additionally, BioDefect mitigates issues related to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
