IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders

Sneha Deshmukh; Prathmesh Kamble

arXiv:2507.02506·cs.CL·July 4, 2025

IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders

Sneha Deshmukh, Prathmesh Kamble

PDF

TL;DR

This paper introduces IndianBailJudgments-1200, a comprehensive dataset of 1200 Indian bail court judgments with multi-attribute annotations, enabling advanced legal NLP research in Indian law.

Contribution

The paper presents the first publicly available, multi-attribute dataset for Indian bail judgments, created using a GPT-4o pipeline for annotation and verification.

Findings

01

Dataset supports various legal NLP tasks like outcome prediction and summarization.

02

Annotations are generated via a prompt-engineered GPT-4o pipeline.

03

The dataset enhances research in Indian legal NLP applications.

Abstract

Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.