Accounting Reasoning in Large Language Models: Concepts, Evaluation, and Empirical Analysis
Jie Zhou, Xin Chen, Jie Zhang, Zhe Li

TL;DR
This paper introduces a framework for evaluating accounting reasoning in large language models, assesses several models including GPT-4, and finds that current LLMs still need improvement for real-world accounting tasks.
Contribution
It proposes a set of evaluation criteria for accounting reasoning in LLMs and benchmarks multiple models, highlighting the current limitations and potential for future improvements.
Findings
GPT-4 shows the strongest accounting reasoning performance among tested models.
Prompt engineering can improve model performance variably.
Current LLMs are insufficient for real-world accounting applications.
Abstract
Large language models (LLMs) are increasingly reshaping learning paradigms, cognitive processes, and research methodologies across diverse domains. As their adoption expands, effectively integrating LLMs into professional fields and clarifying their role in domain-specific applications has become a key challenge for enterprise digital transformation and broader societal development. In the accounting domain, successful integration requires a systematic understanding of LLMs' domain-specific reasoning capabilities. In this study, we introduce the concept of accounting reasoning and propose a set of evaluation criteria grounded in an analysis of the training data characteristics of representative GLM-series models. These criteria establish a foundation for studying accounting-oriented reasoning paradigms and provide benchmarks for assessing and improving model performance. Building on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Computational and Text Analysis Methods
