Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems

Hendrika Maclean; Mert Can Cakmak; Muzakkiruddin Ahmed Mohammed; Shames Al Mandalawi; John Talburt

arXiv:2601.18012·cs.CL·February 10, 2026

Evaluating Semantic and Syntactic Understanding in Large Language Models for Payroll Systems

Hendrika Maclean, Mert Can Cakmak, Muzakkiruddin Ahmed Mohammed, Shames Al Mandalawi, John Talburt

PDF

Open Access

TL;DR

This paper assesses large language models' ability to understand and accurately perform payroll calculations, highlighting their strengths and limitations in high-stakes, precise tasks.

Contribution

It provides a systematic evaluation framework for LLMs on payroll tasks, revealing when careful prompting suffices and when explicit computation is necessary.

Findings

01

Models perform well with careful prompting on simple tasks

02

Explicit computation is needed for complex, high-accuracy requirements

03

The study offers practical guidance for deploying LLMs in sensitive domains

Abstract

Large language models are now used daily for writing, search, and analysis, and their natural language understanding continues to improve. However, they remain unreliable on exact numerical calculation and on producing outputs that are straightforward to audit. We study synthetic payroll system as a focused, high-stakes example and evaluate whether models can understand a payroll schema, apply rules in the right order, and deliver cent-accurate results. Our experiments span a tiered dataset from basic to complex cases, a spectrum of prompts from minimal baselines to schema-guided and reasoning variants, and multiple model families including GPT, Claude, Perplexity, Grok and Gemini. Results indicate clear regimes where careful prompting is sufficient and regimes where explicit computation is required. The work offers a compact, reproducible framework and practical guidance for deploying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification