MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages
Sayan Mahapatra, Debtanu Datta, Shubham Soni, Adrijit Goswami,, Saptarshi Ghosh

TL;DR
This paper introduces MILPaC, a high-quality legal parallel corpus for English and nine Indian languages, and evaluates various machine translation systems and human satisfaction in translating legal texts.
Contribution
It creates the first legal parallel corpus for English and Indian languages and benchmarks multiple MT systems, including human evaluation by legal practitioners.
Findings
Commercial MT systems perform variably on legal texts.
Large Language Models show promising translation quality.
Legal practitioners' satisfaction varies with MT system used.
Abstract
Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems and Large Language Models. Through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Artificial Intelligence in Law · Topic Modeling
