NitiBench: A Comprehensive Study of LLM Framework Capabilities for Thai Legal Question Answering

Pawitsapak Akarajaradwong; Pirat Pothavorn; Chompakorn Chaksangchaichot; Panuthep Tasawong; Thitiwat Nopparatbundit; Keerakiat Pratai; Sarana Nutanong

arXiv:2502.10868·cs.CL·August 25, 2025

NitiBench: A Comprehensive Study of LLM Framework Capabilities for Thai Legal Question Answering

Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Tasawong, Thitiwat Nopparatbundit, Keerakiat Pratai, Sarana Nutanong

PDF

Open Access 1 Repo 2 Models 5 Datasets

TL;DR

This paper introduces NitiBench, a benchmark for Thai legal question answering, evaluating LLM approaches and proposing new metrics, revealing current limitations and guiding future research in Thai legal NLP.

Contribution

The paper presents NitiBench, the first comprehensive benchmark for Thai legal QA, and evaluates LLM-based methods, proposing tailored metrics and analysis of retrieval and reasoning challenges.

Findings

01

Section-based chunking improves retrieval performance.

02

Current retrievers struggle with complex legal queries.

03

Long-context LLMs underperform RAG systems in Thai legal QA.

Abstract

The application of large language models (LLMs) in the legal domain holds significant potential for information retrieval and question answering, yet Thai legal QA systems face challenges due to a lack of standardized evaluation benchmarks and the complexity of Thai legal structures. This paper introduces NitiBench, a benchmark comprising two datasets: the NitiBench-CCL, covering general Thai financial law, and the NitiBench-Tax, which includes real-world tax law cases requiring advanced legal reasoning. We evaluate retrieval-augmented generation (RAG) and long-context LLM-based approaches to address three key research questions: the impact of domain-specific components like section-based chunking and cross-referencing, the comparative performance of different retrievers and LLMs, and the viability of long-context LLMs as an alternative to RAG. Our results show that section-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vistec-ai/nitibench
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Natural Language Processing Techniques · Legal Education and Practice Innovations

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Weight Decay · BART · WordPiece · Layer Normalization