LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification
Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum,, Kripabandhu Ghosh, and Arnab Bhattacharya

TL;DR
This paper introduces LegalSeg, a large annotated dataset for classifying rhetorical roles in Indian legal judgments, and evaluates various models to improve legal document understanding.
Contribution
LegalSeg is the largest dataset for rhetorical role classification in Indian legal judgments, enabling benchmarking of advanced NLP models in this domain.
Findings
Models with broader context outperform sentence-only models.
Structural and sequential information improve classification accuracy.
Challenges remain in distinguishing similar roles and handling class imbalance.
Abstract
In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Law in Society and Culture
