Few-shot learning for security bug report identification
Muhammad Laiq

TL;DR
This paper introduces a few-shot learning approach using SetFit to identify security bug reports effectively with limited labeled data, outperforming traditional methods and reducing annotation effort.
Contribution
The study demonstrates that SetFit-based few-shot learning significantly improves security bug report classification with minimal labeled data, addressing data scarcity issues.
Findings
Achieved an AUC of 0.865 in classification tasks
Outperformed traditional machine learning techniques
Proved effective with small labeled datasets
Abstract
Security bug reports require prompt identification to minimize the window of vulnerability in software systems. Traditional machine learning (ML) techniques for classifying bug reports to identify security bug reports rely heavily on large amounts of labeled data. However, datasets for security bug reports are often scarce in practice, leading to poor model performance and limited applicability in real-world settings. In this study, we propose a few-shot learning-based technique to effectively identify security bug reports using limited labeled data. We employ SetFit, a state-of-the-art few-shot learning framework that combines sentence transformers with contrastive learning and parameter-efficient fine-tuning. The model is trained on a small labeled dataset of bug reports and is evaluated on its ability to classify these reports as either security-related or non-security-related. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Information and Cyber Security
