# Large language models as tax attorneys: a case study in legal capabilities emergence

**Authors:** John J. Nay, David Karamardian, Sarah B. Lawsky, Wenting Tao, Meghana Bhat, Raghav Jain, Aaron Travis Lee, Jonathan H. Choi, Jungo Kasai

PMC · DOI: 10.1098/rsta.2023.0159 · 2024-02-26

## TL;DR

This paper studies how large language models can perform tax law analysis, showing improved performance with newer models and better accuracy when given legal context and examples.

## Contribution

The study introduces a novel approach to evaluating legal reasoning in LLMs using tax law and automated validation pipelines.

## Key findings

- LLM performance in tax law improves with each new model release.
- Few-shot prompting and legal context significantly enhance model accuracy.
- LLMs can perform at high accuracy but still fall short of expert tax lawyer levels.

## Abstract

Better understanding of Large Language Models' (LLMs) legal analysis abilities can contribute to improving the efficiency of legal services, governing artificial intelligence and leveraging LLMs to identify inconsistencies in law. This paper explores LLM capabilities in applying tax law. We choose this area of law because it has a structure that allows us to set up automated validation pipelines across thousands of examples, requires logical reasoning and maths skills, and enables us to test LLM capabilities in a manner relevant to real-world economic lives of citizens and companies. Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release. We experiment with retrieving and using the relevant legal authority to assess the impact of providing additional legal context to LLMs. Few-shot prompting, presenting examples of question–answer pairs, is also found to significantly enhance the performance of the most advanced model, GPT-4. The findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels. As LLMs continue to advance, their ability to reason about law autonomously could have significant implications for the legal profession and AI governance.

This article is part of the theme issue ‘A complexity science approach to law and governance’.

## Full-text entities

- **Genes:** GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}
- **Diseases:** LLM hallucinations (MESH:D006212), LLMs (MESH:D007806)
- **Chemicals:** CoT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10894689/full.md

---
Source: https://tomesphere.com/paper/PMC10894689