Can Large Language Models Predict the Outcome of Judicial Decisions?
Mohamed Bayan Kmainasi, Ali Ezzat Shahroor, Amani Al-Ghraibah

TL;DR
This paper develops an Arabic legal judgment prediction dataset and benchmarks various open-source LLMs, demonstrating that fine-tuned smaller models can perform comparably to larger models in legal decision prediction tasks.
Contribution
It introduces a new Arabic LJP dataset, evaluates multiple LLMs with diverse configurations, and provides insights into resource-efficient fine-tuning for legal NLP applications.
Findings
Fine-tuned smaller models match larger models in performance.
Resource-efficient models are effective for legal judgment prediction.
Comprehensive evaluation framework enhances model assessment.
Abstract
Large Language Models (LLMs) have shown exceptional capabilities in Natural Language Processing (NLP) across diverse domains. However, their application in specialized tasks such as Legal Judgment Prediction (LJP) for low-resource languages like Arabic remains underexplored. In this work, we address this gap by developing an Arabic LJP dataset, collected and preprocessed from Saudi commercial court judgments. We benchmark state-of-the-art open-source LLMs, including LLaMA-3.2-3B and LLaMA-3.1-8B, under varying configurations such as zero-shot, one-shot, and fine-tuning using LoRA. Additionally, we employed a comprehensive evaluation framework that integrates both quantitative metrics (such as BLEU, ROUGE, and BERT) and qualitative assessments (including Coherence, Legal Language, Clarity, etc.) using an LLM. Our results demonstrate that fine-tuned smaller models achieve comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
