Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
Raffi Khatchadourian

TL;DR
This paper introduces DFAH, a framework for measuring determinism and faithfulness in tool-using LLM agents in finance, revealing that models can be deterministic without being accurate and vice versa.
Contribution
The paper presents DFAH, a novel multi-dimensional evaluation framework for assessing determinism and faithfulness in financial LLM agents, supported by extensive empirical analysis.
Findings
Determinism and accuracy are uncorrelated in financial LLM agents.
Small models achieve high determinism but low accuracy.
Frontier models show moderate determinism with variable accuracy.
Abstract
LLM agents struggle with regulatory audit replay: when asked to reproduce a flagged transaction decision with identical inputs, many deployments fail to return consistent results. We introduce the Determinism-Faithfulness Assurance Harness (DFAH), a framework for measuring trajectory determinism, decision determinism, and evidence-conditioned faithfulness in tool-using agents deployed in financial services. Across 4,700+ agentic runs (7 models, 4 providers, 3 financial benchmarks with 50 cases each at T=0.0), we find that decision determinism and task accuracy are not detectably correlated (r = -0.11, 95% CI [-0.49, 0.31], p = 0.63, n = 21 configurations): models can be deterministic without being accurate, and accurate without being deterministic. Because neither metric predicts the other in our sample, both must be measured independently, which is precisely what DFAH provides. Small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuditing, Earnings Management, Governance · Software System Performance and Reliability · Financial Distress and Bankruptcy Prediction
