PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis
Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo, Fonseca, Saravan Rajmohan

TL;DR
This paper introduces PACE-LM, a prompting and augmentation framework using GPT-4 for calibrated confidence estimation in cloud incident root cause analysis, enhancing reliability and decision-making accuracy.
Contribution
The paper presents a novel confidence estimation method for LLM-based root cause prediction that requires minimal information and is applicable to black-box models.
Findings
Calibrated confidence estimates improve decision reliability.
Retrieval-augmented prompting enhances confidence calibration.
Method generalizes across different root cause prediction models.
Abstract
Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Despite the growing prevalence of AI-driven assistants in the root cause analysis process, their effectiveness in assisting on-call engineers is constrained by low accuracy due to the intrinsic difficulty of the task, a propensity for LLM-based approaches to hallucinate, and difficulties in distinguishing these well-disguised hallucinations. To address this challenge, we propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction. Considering the black-box nature of many LLM-based root cause predictors, fine-tuning or temperature-scaling-based approaches are inapplicable. We therefore design an innovative confidence estimation framework based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
