Four-Axis Decision Alignment for Long-Horizon Enterprise AI Agents
Vasundra Srininvasan

TL;DR
This paper introduces a four-axis framework for evaluating long-horizon enterprise AI agents, addressing distinct failure modes and improving interpretability beyond a single success metric.
Contribution
It proposes a novel four-axis decomposition of decision alignment, including a new regulatory axis, and demonstrates its application on a controlled benchmark.
Findings
Retrieval collapses on factual precision across architectures.
Plain summarization performs well on multiple axes.
All architectures show a universal decisional failure mode.
Abstract
Long-horizon enterprise agents make high-stakes decisions (loan underwriting, claims adjudication, clinical review, prior authorization) under lossy memory, multi-step reasoning, and binding regulatory constraints. Current evaluation reports a single task-success scalar that conflates distinct failure modes and hides whether an agent is aligned with the standards its deployment environment requires. We propose that long-horizon decision behavior decomposes into four orthogonal alignment axes, each independently measurable and failable: factual precision (FRP), reasoning coherence (RCS), compliance reconstruction (CRR), and calibrated abstention (CAR). CRR is a novel regulatory-grounded axis; CAR is a measurement axis separating coverage from accuracy. We exercise the decomposition on a controlled benchmark (LongHorizon-Bench) covering loan qualification and insurance claims adjudication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
