Large Language Models Acing Chartered Accountancy
Jatin Gupta, Akhil Sharma, Saransh Singhania, Mohammad Adnan, Sakshi Deo, Ali Imam Abidi, Keshav Gupta

TL;DR
This paper evaluates the financial, legal, and quantitative reasoning abilities of large language models using a new Chartered Accountancy benchmark based on Indian CA exams, revealing performance gaps and potential improvements.
Contribution
Introduces CA-Ben, a novel benchmark for assessing LLMs on domain-specific CA knowledge, and evaluates six prominent models on this challenging financial reasoning task.
Findings
Claude 3.5 Sonnet and GPT-4o outperform others
Models struggle with numerical computations and legal interpretations
Performance varies across different reasoning tasks
Abstract
Advanced intelligent systems, particularly Large Language Models (LLMs), are significantly reshaping financial practices through advancements in Natural Language Processing (NLP). However, the extent to which these models effectively capture and apply domain-specific financial knowledge remains uncertain. Addressing a critical gap in the expansive Indian financial context, this paper introduces CA-Ben, a Chartered Accountancy benchmark specifically designed to evaluate the financial, legal, and quantitative reasoning capabilities of LLMs. CA-Ben comprises structured question-answer datasets derived from the rigorous examinations conducted by the Institute of Chartered Accountants of India (ICAI), spanning foundational, intermediate, and advanced CA curriculum stages. Six prominent LLMs i.e. GPT 4o, LLAMA 3.3 70B, LLAMA 3.1 405B, MISTRAL Large, Claude 3.5 Sonnet, and Microsoft Phi 4 were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Dropout · Cosine Annealing · Discriminative Fine-Tuning · Dense Connections · Byte Pair Encoding · Softmax · Linear Warmup With Cosine Annealing · Attention Dropout
