A Hierarchical Imprecise Probability Approach to Reliability Assessment of Large Language Models
Robab Aghazadeh-Chakherlou, Qing Guo, Siddartha Khastgir, Peter Popov, Xiaoge Zhang, Xingyu Zhao

TL;DR
This paper presents HIP-LLM, a hierarchical imprecise probability framework that models and infers the reliability of large language models, capturing uncertainty and dependencies across domains for more accurate reliability assessment.
Contribution
Introduces HIP-LLM, a novel hierarchical imprecise probability approach for reliability assessment of LLMs, incorporating epistemic uncertainty and operational profiles for comprehensive evaluation.
Findings
HIP-LLM provides more accurate reliability estimates than existing methods.
The framework captures uncertainty across priors and data effectively.
Experiments validate the approach's robustness across multiple datasets.
Abstract
Large Language Models (LLMs) are increasingly deployed across diverse domains, raising the need for rigorous reliability assessment methods. Existing benchmark-based evaluations primarily offer descriptive statistics of model accuracy over datasets, providing limited insight into the probabilistic behavior of LLMs under real operational conditions. This paper introduces HIP-LLM, a Hierarchical Imprecise Probability framework for modeling and inferring LLM reliability. Building upon the foundations of software reliability engineering, HIP-LLM defines LLM reliability as the probability of failure-free operation over a specified number of future tasks under a given Operational Profile (OP). HIP-LLM represents dependencies across (sub-)domains hierarchically, enabling multi-level inference from subdomain to system-level reliability. HIP-LLM embeds imprecise priors to capture epistemic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Software Engineering Research · Formal Methods in Verification
