Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes
Zachary Robertson, Sanmi Koyejo

TL;DR
This paper introduces a method for evaluating AI systems without ground truth by leveraging information theory and strategic gaming, enhancing robustness against adversarial manipulation.
Contribution
It proposes mutual evaluation using prompting to estimate mutual information, improving robustness and reliability in AI evaluation without ground truth.
Findings
TVD-MI maintains effectiveness under attack with AUC 0.70--0.77.
Prompting for information relationships improves robustness over quality judgments.
Decomposition into item-level detection scores addresses peer prediction limitations.
Abstract
We evaluate artificial intelligence (AI) systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. This motivates mutual evaluation, where the overseer is treated as a strategic player estimating mutual information by prompting, making truthful agent reporting an optimal strategy. We show that certain f-divergences, such as total variation distance (TVD), maintain polynomial guarantees under attack, building on an established exponential barrier for estimating mutual information (MI) in worst-case certification settings. Under adversarial attacks, TVD-MI maintains effectiveness (area under the curve 0.70--0.77) while other approaches can decay toward chance, demonstrating that prompting the same system for information relationships rather than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
