Site Reliability Engineering: Application of Item Response Theory to Application Deployment Practices and Controls
Kiran Mahesh ND

TL;DR
This paper introduces new objective metrics, Application Deployment Score and Deployment Index, using Item Response Theory to evaluate and improve application deployment practices and reliability in production environments.
Contribution
It proposes novel metrics based on Item Response Theory to objectively assess deployment quality and reliability, advancing SRE and DevOps measurement methods.
Findings
Application Deployment Score effectively tracks deployment improvement trends.
Deployment Index assesses the effectiveness of deployment guidelines.
Metrics enable balancing reliability and product velocity.
Abstract
Reliability of an application or solution in production environment is one of the fundamental features where every SRE team is critically focused upon. At the same time achieving extreme reliability comes with the cost which include but not limited to slow pace of new feature deployments, operations cost and opportunity cost. One such earlier effort in giving an objective metric to strike the fine balance between acceptable reliability and product velocity is error budget and its associated policy. There are also contemporary deployment guidelines and controls per organization to ascertain the reliability of an application deployment version into customer facing or production environments. This work proposes new objective metrics called Application Deployment Score estimated using dichotomous Item Response Theory model. This score is used to assess the improvement trend of each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Software Engineering Techniques and Practices · Reliability and Maintenance Optimization
