A Theoretical Framework for Adaptive Utility-Weighted Benchmarking

Philip Waggoner

arXiv:2602.12356·cs.AI·February 16, 2026

A Theoretical Framework for Adaptive Utility-Weighted Benchmarking

Philip Waggoner

PDF

Open Access

TL;DR

This paper proposes a theoretical framework for adaptive, stakeholder-aware benchmarking in AI, enabling dynamic, context-sensitive evaluation that incorporates human preferences and evolves over time.

Contribution

It introduces a multilayer, adaptive network model for benchmarking that integrates stakeholder priorities and allows for dynamic evolution of evaluation metrics.

Findings

01

Formalizes human-in-the-loop benchmark updates

02

Generalizes classical leaderboards as a special case

03

Provides tools for analyzing benchmark structural properties

Abstract

Benchmarking has long served as a foundational practice in machine learning and, increasingly, in modern AI systems such as large language models, where shared tasks, metrics, and leaderboards offer a common basis for measuring progress and comparing approaches. As AI systems are deployed in more varied and consequential settings, though, there is growing value in complementing these established practices with a more holistic conceptualization of what evaluation should represent. Of note, recognizing the sociotechnical contexts in which these systems operate invites an opportunity for a deeper view of how multiple stakeholders and their unique priorities might inform what we consider meaningful or desirable model behavior. This paper introduces a theoretical framework that reconceptualizes benchmarking as a multilayer, adaptive network linking evaluation metrics, model components, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning