Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes

Zachary Robertson; Sanmi Koyejo

arXiv:2508.05469·cs.LG·May 1, 2026

Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes

Zachary Robertson, Sanmi Koyejo

PDF

TL;DR

This paper introduces a method for evaluating AI systems without ground truth by leveraging information theory and strategic gaming, enhancing robustness against adversarial manipulation.

Contribution

It proposes mutual evaluation using prompting to estimate mutual information, improving robustness and reliability in AI evaluation without ground truth.

Findings

01

TVD-MI maintains effectiveness under attack with AUC 0.70--0.77.

02

Prompting for information relationships improves robustness over quality judgments.

03

Decomposition into item-level detection scores addresses peer prediction limitations.

Abstract

We evaluate artificial intelligence (AI) systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. This motivates mutual evaluation, where the overseer is treated as a strategic player estimating mutual information by prompting, making truthful agent reporting an optimal strategy. We show that certain f-divergences, such as total variation distance (TVD), maintain polynomial guarantees under attack, building on an established exponential barrier for estimating mutual information (MI) in worst-case certification settings. Under adversarial attacks, TVD-MI maintains effectiveness (area under the curve 0.70--0.77) while other approaches can decay toward chance, demonstrating that prompting the same system for information relationships rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.