Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI

Christopher Koch; Joshua Andreas Wellbrock

arXiv:2604.19818·cs.SE·April 23, 2026

Beyond Task Success: An Evidence-Synthesis Framework for Evaluating, Governing, and Orchestrating Agentic AI

Christopher Koch, Joshua Andreas Wellbrock

PDF

TL;DR

This paper proposes an integrated framework and artifacts to evaluate, govern, and orchestrate agentic AI systems, addressing the gap between governance policies and concrete actions.

Contribution

It introduces a four-layer framework, a runtime-placement test, and an action-evidence bundle to improve trustworthiness and compliance in agentic AI deployment.

Findings

01

Evaluation highlights safety and robustness gaps.

02

Governance frameworks lack execution-time control logic.

03

Runtime behavior cannot be governed solely through static permissions.

Abstract

Agentic AI systems plan, use tools, maintain state, and act across multi-step workflows with external effects, meaning trustworthy deployment can no longer be judged by task completion alone. The current literature remains fragmented across benchmark-centered evaluation, standards-based governance, orchestration architectures, and runtime assurance mechanisms. This paper contributes a bounded evidence synthesis across a manually coded corpus of twenty-four recent sources. The core finding is a governance-to-action closure gap: evaluation tells us whether outcomes were good, governance defines what should be allowed, but neither identifies where obligations bind to concrete actions or how compliance can later be proven. To close that gap, the paper introduces three linked artifacts: (1) a four-layer framework spanning evaluation, governance, orchestration, and assurance; (2) an ODTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.