LegalSeg: Unlocking the Structure of Indian Legal Judgments Through   Rhetorical Role Classification

Shubham Kumar Nigam; Tanmay Dubey; Govind Sharma; Noel Shallum,; Kripabandhu Ghosh; and Arnab Bhattacharya

arXiv:2502.05836·cs.CL·February 11, 2025

LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification

Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum,, Kripabandhu Ghosh, and Arnab Bhattacharya

PDF

Open Access 1 Video

TL;DR

This paper introduces LegalSeg, a large annotated dataset for classifying rhetorical roles in Indian legal judgments, and evaluates various models to improve legal document understanding.

Contribution

LegalSeg is the largest dataset for rhetorical role classification in Indian legal judgments, enabling benchmarking of advanced NLP models in this domain.

Findings

01

Models with broader context outperform sentence-only models.

02

Structural and sequential information improve classification accuracy.

03

Challenges remain in distinguishing similar roles and handling class imbalance.

Abstract

In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification· underline

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Law in Society and Culture