LawSum: A weakly supervised approach for Indian Legal Document Summarization
Vedant Parikh, Vidit Mathur, Parth Mehta, Namita Mittal, Prasenjit, Majumder

TL;DR
This paper introduces a new dataset of Indian legal judgments with summaries and attributes, and proposes a weakly supervised method for summarizing unstructured legal documents, enabling improved legal analytics and retrieval.
Contribution
The work creates a large, annotated dataset of Indian legal judgments and develops an auto-labeling technique for training a weakly supervised summarization model.
Findings
High accuracy in sentence extraction using auto-labeled data
Effective summarization of Indian legal judgments demonstrated
Potential applications in legal retrieval and decision prediction
Abstract
Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their corresponding hand written summaries. The proposed dataset is pre-processed by normalising common legal abbreviations, handling spelling variations in named entities, handling bad punctuations and accurate sentence tokenization. Each sentence is tagged with their rhetorical roles. We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them, judges who delivered the judgement, acts/statutes that are cited and the most common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Topic Modeling · Natural Language Processing Techniques
