SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
Zixiao Zhao, Yanjie Jiang, Hui Liu, Kui Liu, Lu Zhang

TL;DR
SeRe is a large, security-focused code review dataset created using active learning, enabling better research and tools for automated security feedback in software development.
Contribution
The paper introduces SeRe, a novel large-scale security-related code review dataset, and demonstrates its utility through benchmarking security feedback generation methods.
Findings
SeRe contains 6,732 security-related reviews from 373,824 instances.
SeRe aligns well with real-world security review distributions.
Benchmarking shows current methods have room for improvement in security feedback generation.
Abstract
Software security vulnerabilities can lead to severe consequences, making early detection essential. Although code review serves as a critical defense mechanism against security flaws, relevant feedback remains scarce due to limited attention to security issues or a lack of expertise among reviewers. Existing datasets and studies primarily focus on general-purpose code review comments, either lacking security-specific annotations or being too limited in scale to support large-scale research. To bridge this gap, we introduce \textbf{SeRe}, a \textbf{security-related code review dataset}, constructed using an active learning-based ensemble classification approach. The proposed approach iteratively refines model predictions through human annotations, achieving high precision while maintaining reasonable recall. Using the fine-tuned ensemble classifier, we extracted 6,732 security-related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
