A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores
Ke Shen, Mayank Kejriwal

TL;DR
This paper introduces a formal framework and novel metrics for assessing and reducing risks in large language models, demonstrating improved decision-making and risk management in natural language inference tasks.
Contribution
It formalizes decision and composite risks in LLMs, proposes a risk-centric evaluation framework, and introduces DwD, a calibration method to minimize risks in LLM-based NLP applications.
Findings
DwD reduces decision risk by 20.1% in low-risk tasks.
DwD skips 19.8% of high-risk tasks to prevent errors.
Evaluation framework effectively measures risks in LLMs.
Abstract
Large Language Models (LLMs), such as ChatGPT, have achieved impressive milestones in natural language processing (NLP). Despite their impressive performance, the models are known to pose important risks. As these models are deployed in real-world applications, a systematic understanding of different risks posed by these models on tasks such as natural language inference (NLI), is much needed. In this paper, we define and formalize two distinct types of risk: decision risk and composite risk. We also propose a risk-centric evaluation framework, and four novel metrics, for assessing LLMs on these risks in both in-domain and out-of-domain settings. Finally, we propose a risk-adjusted calibration method called DwD for helping LLMs minimize these risks in an overall NLI architecture. Detailed experiments, using four NLI benchmarks, three baselines and two LLMs, including ChatGPT, show both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
