Sampling Preferences Yields Simple Trustworthiness Scores
Sean Steinle

TL;DR
This paper introduces preference sampling, a method to derive a single trustworthiness score from multi-dimensional LLM evaluations, improving decision-making by incorporating user preferences and outperforming other aggregation methods.
Contribution
The work presents preference sampling as a novel approach for aggregating multi-dimensional evaluation metrics into a scalar score, enhancing interpretability and user control.
Findings
Preference sampling fully reduces candidate models 100% of the time.
It is consistently sensitive to user priors and preferences.
Outperforms Pareto optimality and averaging methods in trustworthiness evaluation.
Abstract
With the onset of large language models (LLMs), the performance of artificial intelligence (AI) models is becoming increasingly multi-dimensional. Accordingly, there have been several large, multi-dimensional evaluation frameworks put forward to evaluate LLMs. Though these frameworks are much more realistic than previous attempts which only used a single score like accuracy, multi-dimensional evaluations can complicate decision-making since there is no obvious way to select an optimal model. This work introduces preference sampling, a method to extract a scalar trustworthiness score from multi-dimensional evaluation results by considering the many characteristics of model performance which users value. We show that preference sampling improves upon alternate aggregation methods by using multi-dimensional trustworthiness evaluations of LLMs from TrustLLM and DecodingTrust. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
MethodsSparse Evolutionary Training
