Algorithmically Establishing Trust in Evaluators

Adrian de Wynter

arXiv:2506.03083·cs.DS·February 12, 2026

Algorithmically Establishing Trust in Evaluators

Adrian de Wynter

PDF

Open Access 1 Repo

TL;DR

This paper introduces the No-Data Algorithm, a method to establish trust in evaluators like LLMs without needing labeled data, by challenging them repeatedly and analyzing their responses.

Contribution

It presents a provably correct algorithm that assesses evaluator trustworthiness without relying on reference data, suitable for low-resource language labeling.

Findings

01

The algorithm reliably accepts trustworthy evaluators after r challenges.

02

It effectively flags untrustworthy evaluators.

03

Empirical tests confirm theoretical guarantees.

Abstract

An evaluator, such as an LLM-as-a-judge, is trustworthy when there exists some agreed-upon way to measure its performance as a labeller. Traditional approaches either rely on testing the evaluator against references or assume that it `knows' somehow the correct labelling. Both approaches fail when references are unavailable: the former requires data, and the latter is an assumption, not evidence. To address this, we introduce the `No-Data Algorithm', which provably establishes trust in an evaluator without requiring any labelled data. Our algorithm works by successively posing challenges to said evaluator. We prove that after $r$ challenge rounds, it accepts an evaluator which knows the correct labels with probability $\geq 1 - (1/4)^{r}$ , and reliably flags untrustworthy ones. We present formal proofs of correctness, empirical tests, and applications to assessing trust in LLMs-as-judges…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adewynter/no_data_algorithm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI