# Language Models and Logic Programs for Trustworthy Tax Reasoning

**Authors:** William Jurayj, Nils Holzenberger, Benjamin Van Durme

arXiv: 2508.21051 · 2026-02-06

## TL;DR

This paper presents a neuro-symbolic system combining language models and logic programming to improve the accuracy and cost-effectiveness of automated tax reasoning, demonstrating promising results on a challenging dataset.

## Contribution

It introduces a novel integration of LLMs with symbolic solvers and formal logic translation for trustworthy tax reasoning, with a cost estimation method and performance improvements.

## Key findings

- Enhanced accuracy on the SARA dataset
- Cost reduction below real-world penalties
- Effective semantic parsing for statutory reasoning

## Abstract

According to the United States Internal Revenue Service, ``the average American spends $\$270$ and 13 hours filing their taxes''. Even beyond the U.S., tax filing requires complex reasoning, combining application of overlapping rules with numerical calculations. Because errors can incur costly penalties, any automated system must deliver high accuracy and auditability, making modern large language models (LLMs) poorly suited for this task. We propose an approach that integrates LLMs with a symbolic solver to calculate tax obligations. We evaluate variants of this system on the challenging StAtutory Reasoning Assessment (SARA) dataset, and include a novel method for estimating the cost of deploying such a system based on real-world penalties for tax errors. We further show how combining up-front translation of plain-text rules into formal logic programs, combined with intelligently retrieved exemplars for formal case representations, can dramatically improve performance on this task and reduce costs to well below real-world averages. Our results demonstrate the effectiveness of applying semantic parsing methods to statutory reasoning, and show promising economic feasibility of neuro-symbolic architectures for increasing access to reliable tax assistance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21051/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21051/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/2508.21051/full.md

---
Source: https://tomesphere.com/paper/2508.21051