TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice

Gang Hu; Yating Chen; Haiyan Ding; Wang Gao; Jiajia Huang; Min Peng; Qianqian Xie; Kun Yue

arXiv:2604.08948·cs.CL·April 23, 2026

TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice

Gang Hu, Yating Chen, Haiyan Ding, Wang Gao, Jiajia Huang, Min Peng, Qianqian Xie, Kun Yue

PDF

TL;DR

TaxPraBen is a comprehensive benchmark designed to evaluate Chinese LLMs in real-world tax practice tasks, highlighting performance gaps and guiding future improvements.

Contribution

It introduces the first structured, scalable benchmark for Chinese tax-related tasks, combining traditional and real-world scenarios for end-to-end assessment.

Findings

01

Closed-source large LLMs perform best.

02

Chinese LLMs like Qwen2.5 outperform multilingual models.

03

Fine-tuning with tax data yields limited improvements.

Abstract

While Large Language Models (LLMs) excel in various general domains, they exhibit notable gaps in the highly specialized, knowledge-intensive, and legally regulated Chinese tax domain. Consequently, while tax-related benchmarks are gaining attention, many focus on isolated NLP tasks, neglecting real-world practical capabilities. To address this issue, we introduce TaxPraBen, the first dedicated benchmark for Chinese taxation practice. It combines 10 traditional application tasks, along with 3 pioneering real-world scenarios: tax risk prevention, tax inspection analysis, and tax strategy planning, sourced from 14 datasets totaling 7.3K instances. TaxPraBen features a scalable structured evaluation paradigm designed through process of "structured parsing-field alignment extraction-numerical and textual matching", enabling end-to-end tax practice assessment while being extensible to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.