PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation   with GPT-4 in Cloud Incident Root Cause Analysis

Dylan Zhang; Xuchao Zhang; Chetan Bansal; Pedro Las-Casas; Rodrigo; Fonseca; Saravan Rajmohan

arXiv:2309.05833·cs.CL·October 2, 2023·2 cites

PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis

Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo, Fonseca, Saravan Rajmohan

PDF

Open Access

TL;DR

This paper introduces PACE-LM, a prompting and augmentation framework using GPT-4 for calibrated confidence estimation in cloud incident root cause analysis, enhancing reliability and decision-making accuracy.

Contribution

The paper presents a novel confidence estimation method for LLM-based root cause prediction that requires minimal information and is applicable to black-box models.

Findings

01

Calibrated confidence estimates improve decision reliability.

02

Retrieval-augmented prompting enhances confidence calibration.

03

Method generalizes across different root cause prediction models.

Abstract

Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Despite the growing prevalence of AI-driven assistants in the root cause analysis process, their effectiveness in assisting on-call engineers is constrained by low accuracy due to the intrinsic difficulty of the task, a propensity for LLM-based approaches to hallucinate, and difficulties in distinguishing these well-disguised hallucinations. To address this challenge, we propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction. Considering the black-box nature of many LLM-based root cause predictors, fine-tuning or temperature-scaling-based approaches are inapplicable. We therefore design an innovative confidence estimation framework based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling