Large Language Models Acing Chartered Accountancy

Jatin Gupta; Akhil Sharma; Saransh Singhania; Mohammad Adnan; Sakshi Deo; Ali Imam Abidi; Keshav Gupta

arXiv:2506.21031·cs.CL·June 27, 2025

Large Language Models Acing Chartered Accountancy

Jatin Gupta, Akhil Sharma, Saransh Singhania, Mohammad Adnan, Sakshi Deo, Ali Imam Abidi, Keshav Gupta

PDF

Open Access

TL;DR

This paper evaluates the financial, legal, and quantitative reasoning abilities of large language models using a new Chartered Accountancy benchmark based on Indian CA exams, revealing performance gaps and potential improvements.

Contribution

Introduces CA-Ben, a novel benchmark for assessing LLMs on domain-specific CA knowledge, and evaluates six prominent models on this challenging financial reasoning task.

Findings

01

Claude 3.5 Sonnet and GPT-4o outperform others

02

Models struggle with numerical computations and legal interpretations

03

Performance varies across different reasoning tasks

Abstract

Advanced intelligent systems, particularly Large Language Models (LLMs), are significantly reshaping financial practices through advancements in Natural Language Processing (NLP). However, the extent to which these models effectively capture and apply domain-specific financial knowledge remains uncertain. Addressing a critical gap in the expansive Indian financial context, this paper introduces CA-Ben, a Chartered Accountancy benchmark specifically designed to evaluate the financial, legal, and quantitative reasoning capabilities of LLMs. CA-Ben comprises structured question-answer datasets derived from the rigorous examinations conducted by the Institute of Chartered Accountants of India (ICAI), spanning foundational, intermediate, and advanced CA curriculum stages. Six prominent LLMs i.e. GPT 4o, LLAMA 3.3 70B, LLAMA 3.1 405B, MISTRAL Large, Claude 3.5 Sonnet, and Microsoft Phi 4 were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Distress and Bankruptcy Prediction

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Dropout · Cosine Annealing · Discriminative Fine-Tuning · Dense Connections · Byte Pair Encoding · Softmax · Linear Warmup With Cosine Annealing · Attention Dropout