Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction
Harit Vishwakarma, Alan Mishler, Thomas Cook, Niccol\`o Dalmasso, Natraj Raman, Sumitra Ganesh

TL;DR
This paper introduces CROQ, a method that refines large language model questions using conformal prediction to improve accuracy, especially when combined with an optimization framework called CP-OPT that reduces prediction set sizes.
Contribution
The paper proposes CROQ, a novel question revision technique leveraging conformal prediction, and CP-OPT, an optimization framework to minimize set sizes while maintaining coverage, enhancing LLM decision-making.
Findings
CROQ improves LLM accuracy over standard inference.
Combining CROQ with CP-OPT yields larger gains in accuracy.
The methods are effective across multiple datasets and LLMs.
Abstract
Large language models (LLMs) are empowering decision-making in several applications, including tool or API usage and answering multiple-choice questions (MCQs). However, incorrect outputs pose significant risks in high-stakes domains like healthcare and finance. To quantify LLM uncertainty and thereby mitigate these risks, recent works employ conformal prediction (CP), a model- and distribution-agnostic framework that uses LLM outputs to generate a \emph{prediction set} containing the true answer with high probability. Leveraging CP, we propose \emph{conformal revision of questions} (CROQ), which revises the question by narrowing down the available choices to those in the prediction set and asking the LLM the revised question. We expect LLMs to be more accurate on revised questions with fewer choices. Furthermore, we expect CROQ to be effective when the prediction sets from CP are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBig Data and Business Intelligence
MethodsSparse Evolutionary Training
