Moving Faster and Reducing Risk: Using LLMs in Release Deployment
Rui Abreu, Vijayaraghavan Murali, Peter C Rigby, Chandra Maddila,, Weiyan Sun, Jun Ge, Kaavya Chinniah, Audris Mockus, Megh Mehta, Nachiappan, Nagappan

TL;DR
This paper develops models to predict the risk of code diffs causing severe faults, enabling safer release gating at scale, using logistic regression, BERT, and LLMs, with LLMs showing improved performance.
Contribution
Introduces diff risk score models leveraging LLMs to improve release gating accuracy and reduce fault risk in large-scale software deployment.
Findings
Baseline regression captures up to 84.6% of SEVs with 50% gating.
StarBERT captures fewer SEVs than regression models.
Generative LLMs outperform regression models in SEV capture rate.
Abstract
Release engineering has traditionally focused on continuously delivering features and bug fixes to users, but at a certain scale, it becomes impossible for a release engineering team to determine what should be released. At Meta's scale, the responsibility appropriately and necessarily falls back on the engineer writing and reviewing the code. To address this challenge, we developed models of diff risk scores (DRS) to determine how likely a diff is to cause a SEV, i.e., a severe fault that impacts end-users. Assuming that SEVs are only caused by diffs, a naive model could randomly gate X% of diffs from landing, which would automatically catch X% of SEVs on average. However, we aimed to build a model that can capture Y% of SEVs by gating X% of diffs, where Y >> X. By training the model on historical data on diffs that have caused SEVs in the past, we can predict the riskiness of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Software System Performance and Reliability
