Automating Financial Statement Audits with Large Language Models
Rushi Wang, Jiateng Liu, Weijie Zhao, Shenglan Li, Denghui Zhang

TL;DR
This paper evaluates the capabilities and limitations of large language models in automating financial statement audits, highlighting their potential and current gaps in error detection, explanation, and compliance with standards.
Contribution
It introduces a comprehensive benchmark and evaluation framework for assessing LLMs in financial auditing, revealing their strengths and weaknesses in real-world scenarios.
Findings
LLMs successfully identify financial statement errors from transaction data.
Models struggle to explain errors and cite relevant standards.
Current LLMs have significant limitations in executing complete audits.
Abstract
Financial statement auditing is essential for stakeholders to understand a company's financial health, yet current manual processes are inefficient and error-prone. Even with extensive verification procedures, auditors frequently miss errors, leading to inaccurate financial statements that fail to meet stakeholder expectations for transparency and reliability. To this end, we harness large language models (LLMs) to automate financial statement auditing and rigorously assess their capabilities, providing insights on their performance boundaries in the scenario of automated auditing. Our work introduces a comprehensive benchmark using a curated dataset combining real-world financial tables with synthesized transaction data. In the benchmark, we developed a rigorous five-stage evaluation framework to assess LLMs' auditing capabilities. The benchmark also challenges models to map specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
