HLDC: Hindi Legal Documents Corpus
Arnav Kapoor, Mudit Dhawan, Anmol Goel, T.H. Arjun and, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, and Ponnurangam Kumaraguru, Ashutosh Modi

TL;DR
This paper introduces HLDC, a large Hindi legal documents corpus, and demonstrates its utility by developing a multi-task learning model for bail prediction, highlighting the need for further research in low-resource legal NLP applications.
Contribution
The paper provides the first large-scale Hindi legal corpus and a novel multi-task learning approach for bail prediction using this data.
Findings
Multi-task learning improves bail prediction accuracy.
The corpus enables new legal NLP research in Hindi.
Further research is needed for optimal model performance.
Abstract
Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such as Hindi. In this resource paper, we introduce the Hindi Legal Documents Corpus (HLDC), a corpus of more than 900K legal documents in Hindi. Documents are cleaned and structured to enable the development of downstream applications. Further, as a use-case for the corpus, we introduce the task of bail prediction. We experiment with a battery of models and propose a Multi-Task Learning (MTL) based model for the same. MTL models use summarization as an auxiliary task along with bail prediction as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Natural Language Processing Techniques
