Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents

Waseem AlShikh; Muayad Sayed Ali; Brian Kennedy; Dmytro Mozolevskyi

arXiv:2511.08242·cs.AI·November 12, 2025

Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents

Waseem AlShikh, Muayad Sayed Ali, Brian Kennedy, Dmytro Mozolevskyi

PDF

Open Access

TL;DR

This paper introduces a comprehensive, outcome-oriented evaluation framework with eleven metrics for AI agents, enabling assessment of decision quality, autonomy, and business value across diverse domains and architectures.

Contribution

It proposes a novel, standardized set of outcome-based, task-agnostic metrics and demonstrates their effectiveness through large-scale simulated experiments across multiple agent types and domains.

Findings

01

Hybrid agents outperform others on most metrics.

02

Goal Completion Rate averaged 88.8%.

03

Significant performance trade-offs observed between agent designs.

Abstract

As AI agents proliferate across industries and applications, evaluating their performance based solely on infrastructural metrics such as latency, time-to-first-token, or token throughput is proving insufficient. These metrics fail to capture the quality of an agent's decisions, its operational autonomy, or its ultimate business value. This white paper proposes a novel, comprehensive framework of eleven outcome-based, task-agnostic performance metrics for AI agents that transcend domain boundaries. These metrics are designed to enable organizations to evaluate agents based on the quality of their decisions, their degree of autonomy, their adaptability to new challenges, and the tangible business value they deliver, regardless of the underlying model architecture or specific use case. We introduce metrics such as Goal Completion Rate (GCR), Autonomy Index (AIx), Multi-Step Task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Multi-Agent Systems and Negotiation