# Legal case documents: A comprehensive dataset for Arabic natural language processing research and applications

**Authors:** Soha Zarbah, Arwa Wali, Dimah Alahmadi

PMC · DOI: 10.1016/j.dib.2025.112429 · Data in Brief · 2026-01-02

## TL;DR

This paper introduces a new Arabic legal case dataset to support natural language processing research and applications in the legal field.

## Contribution

The novel contribution is the creation of a comprehensive Arabic legal case dataset with summaries, keywords, and categories.

## Key findings

- The dataset contains 3170 legal cases from Saudi Arabia's Board of Grievances website.
- The dataset spans 47 classes and includes case summaries, keywords, and categories.
- It supports NLP tasks like categorization, sentiment analysis, and summarization.

## Abstract

The legal sector remains distinctive due to the complex language structure and specialized terminology of legal data. This complexity offers considerable contextual information, which demands natural language processing (NLP). The availability of high-quality and well-structured legal datasets is essential for advancing NLP research and applications within the legal field. However, a gap exists within the Arabic legal NLP owing to insufficient research and datasets. To address this gap, we aim to propose an Arabic legal case dataset containing cases, case summaries, relevant keywords, and case categories. The legal case data were obtained from the Board of Grievances website in Saudi Arabia and include 3170 cases distributed across 47 classes. The number of words in these cases varies significantly, ranging from about 100 to nearly 30,000 words per case. Moreover, the number of pages varies, ranging from one page to 80 pages per case. Therefore, this dataset supports various NLP applications, including text categorization, data extraction, sentiment analysis, and summarization, thereby improving task efficiency and decision accuracy in the legal profession.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12860909/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12860909/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12860909/full.md

---
Source: https://tomesphere.com/paper/PMC12860909