AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents

Qinghua Lu; Dehai Zhao; Yue Liu; Hao Zhang; Liming Zhu; Xiwei Xu; Angela Shi; Tristan Tan; Rick Kazman

arXiv:2510.21031·cs.SE·October 27, 2025

AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents

Qinghua Lu, Dehai Zhao, Yue Liu, Hao Zhang, Liming Zhu, Xiwei Xu, Angela Shi, Tristan Tan, Rick Kazman

PDF

TL;DR

AgentArcEval is a new evaluation method tailored for assessing the complex architectures of foundation model-based agents, incorporating a scenario catalogue and demonstrated through a real-world case study.

Contribution

The paper introduces AgentArcEval, a novel evaluation framework specifically designed for foundation model-based agent architectures, addressing their unique complexities.

Findings

01

AgentArcEval effectively evaluates agent architectures in a case study.

02

The scenario catalogue aids in systematic scenario generation for evaluation.

03

The method improves understanding of architecture impacts on agent performance.

Abstract

The emergence of foundation models (FMs) has enabled the development of highly capable and autonomous agents, unlocking new application opportunities across a wide range of domains. Evaluating the architecture of agents is particularly important as the architectural decisions significantly impact the quality attributes of agents given their unique characteristics, including compound architecture, autonomous and non-deterministic behaviour, and continuous evolution. However, these traditional methods fall short in addressing the evaluation needs of agent architecture due to the unique characteristics of these agents. Therefore, in this paper, we present AgentArcEval, a novel agent architecture evaluation method designed specially to address the complexities of FM-based agent architecture and its evaluation. Moreover, we present a catalogue of agent-specific general scenarios, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.