AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems
Zirui Wang, Guangba Yu, Michael R.Lyu

TL;DR
AI-NativeBench introduces a white-box, application-centric benchmark suite for AI-Native systems, enabling detailed analysis of system-level behaviors and engineering challenges beyond traditional model capability metrics.
Contribution
This work presents the first open-source, white-box benchmark suite for AI-Native systems based on Model Context Protocol and Agent-to-Agent standards, enabling granular system analysis.
Findings
Lightweight models often outperform flagship models in protocol adherence.
Inference dominance makes protocol overhead less significant.
Self-healing mechanisms can increase costs on unviable workflows.
Abstract
The transition from Cloud-Native to AI-Native architectures is fundamentally reshaping software engineering, replacing deterministic microservices with probabilistic agentic services. However, this shift renders traditional black-box evaluation paradigms insufficient: existing benchmarks measure raw model capabilities while remaining blind to system-level execution dynamics. To bridge this gap, we introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards. By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities. Leveraging this benchmark across 21 system variants, we uncover critical engineering realities invisible to traditional metrics: a parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Advanced Software Engineering Methodologies · Scientific Computing and Data Management
