DRS-OSS: Practical Diff Risk Scoring with LLMs
Ali Sayedsalehi, Peter C. Rigby, and Audris Mockus

TL;DR
This paper introduces DRS-OSS, an efficient LLM-based tool for diff risk scoring in open-source projects, enabling better prioritization and defect prevention with real-time feedback and integration into developer workflows.
Contribution
It presents a deployable, open-source diff risk scoring system using a fine-tuned Llama 3.1 model with state-of-the-art performance and practical deployment features.
Findings
Achieves an F1 score of 0.64 and ROC-AUC of 0.89 on ApacheJIT benchmark.
Gating the top 30% riskiest commits can prevent up to 86.4% of defect-inducing changes.
Demonstrates efficient training on a single GPU using parameter-efficient methods.
Abstract
In large-scale open-source projects, hundreds of pull requests land daily, each a potential source of regressions. Diff risk scoring (DRS) estimates how likely an individual code change is to introduce a defect. This score can help prioritize reviews and tests, gate high-risk changes, and manage CI/CD capacity. Building on this idea, we present DRS-OSS, an open-source DRS tool equipped with a public API, web UI, and GitHub plugin. DRS-OSS is a deployable, LLM-based diff risk scoring system for open-source projects built around a fine-tuned Llama 3.1 8B sequence classifier. The model consumes long-context representations that combine commit messages, structured diffs, and change metrics, and is trained on the ApacheJIT dataset. Using parameter-efficient adaptation, 4-bit QLoRA, and DeepSpeed ZeRO-3 CPU offloading, we train the model with 22k-token contexts on a single 20 GB GPU,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
