Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Kush Juvekar, Arghya Bhattacharya, Sai Khadloya, Utkarsh Saxena

TL;DR
This study evaluates Indian legal reasoning capabilities of large language models using public legal exams, revealing their strengths in objective questions but limitations in long-form reasoning and procedural compliance.
Contribution
First India-specific, exam-grounded benchmark for assessing LLMs' legal reasoning and court-readiness with datasets and evaluation protocols.
Findings
LLMs often meet or exceed top human scorers on objective exams.
LLMs do not surpass human experts in long-form legal reasoning.
Identified key failure modes: procedural compliance, citation discipline, and appropriate voice.
Abstract
Large language models (LLMs) are entering legal workflows, yet we lack a jurisdiction-specific framework to assess their baseline competence therein. We use India's public legal examinations as a transparent proxy. Our multi-year benchmark assembles objective screens from top national and state exams and evaluates open and frontier LLMs under real-world exam conditions. To probe beyond multiple-choice questions, we also include a lawyer-graded, paired-blinded study of long-form answers from the Supreme Court's Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded, India-specific yardstick for LLM court-readiness released with datasets and protocols. Our work shows that while frontier systems consistently clear historical cutoffs and often match or exceed recent top-scorer bands on objective exams, none surpasses the human topper on long-form reasoning. Grader notes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Legal Language and Interpretation · Law, AI, and Intellectual Property
