Evaluating Accounting Reasoning Capabilities of Large Language Models

Jie Zhou; Xin Chen; Jie Zhang; Hai Li; Jie Wang; Zhe Li

arXiv:2601.06707·cs.CL·January 13, 2026

Evaluating Accounting Reasoning Capabilities of Large Language Models

Jie Zhou, Xin Chen, Jie Zhang, Hai Li, Jie Wang, Zhe Li

PDF

Open Access

TL;DR

This paper assesses the accounting reasoning abilities of large language models, proposing evaluation criteria and benchmarks, revealing that while GPT-4 performs best, models still need improvement for real-world enterprise use.

Contribution

It introduces a systematic framework and benchmarks for evaluating accounting reasoning in large language models, guiding future enhancements.

Findings

01

GPT-4 shows the strongest accounting reasoning performance.

02

Prompt design significantly impacts model performance.

03

Current models are inadequate for real-world enterprise accounting.

Abstract

Large language models are transforming learning, cognition, and research across many fields. Effectively integrating them into professional domains, such as accounting, is a key challenge for enterprise digital transformation. To address this, we define vertical domain accounting reasoning and propose evaluation criteria derived from an analysis of the training data characteristics of representative GLM models. These criteria support systematic study of accounting reasoning and provide benchmarks for performance improvement. Using this framework, we evaluate GLM-6B, GLM-130B, GLM-4, and OpenAI GPT-4 on accounting reasoning tasks. Results show that prompt design significantly affects performance, with GPT-4 demonstrating the strongest capability. Despite these gains, current models remain insufficient for real-world enterprise accounting, indicating the need for further optimization to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccounting Education and Careers · Auditing, Earnings Management, Governance · Financial Reporting and XBRL