Sampling Preferences Yields Simple Trustworthiness Scores

Sean Steinle

arXiv:2506.03399·cs.HC·June 5, 2025

Sampling Preferences Yields Simple Trustworthiness Scores

Sean Steinle

PDF

Open Access

TL;DR

This paper introduces preference sampling, a method to derive a single trustworthiness score from multi-dimensional LLM evaluations, improving decision-making by incorporating user preferences and outperforming other aggregation methods.

Contribution

The work presents preference sampling as a novel approach for aggregating multi-dimensional evaluation metrics into a scalar score, enhancing interpretability and user control.

Findings

01

Preference sampling fully reduces candidate models 100% of the time.

02

It is consistently sensitive to user priors and preferences.

03

Outperforms Pareto optimality and averaging methods in trustworthiness evaluation.

Abstract

With the onset of large language models (LLMs), the performance of artificial intelligence (AI) models is becoming increasingly multi-dimensional. Accordingly, there have been several large, multi-dimensional evaluation frameworks put forward to evaluate LLMs. Though these frameworks are much more realistic than previous attempts which only used a single score like accuracy, multi-dimensional evaluations can complicate decision-making since there is no obvious way to select an optimal model. This work introduces preference sampling, a method to extract a scalar trustworthiness score from multi-dimensional evaluation results by considering the many characteristics of model performance which users value. We show that preference sampling improves upon alternate aggregation methods by using multi-dimensional trustworthiness evaluations of LLMs from TrustLLM and DecodingTrust. We find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education

MethodsSparse Evolutionary Training