Beyond Guilt: Legal Judgment Prediction with Trichotomous Reasoning
Kepu Zhang, Haoyue Yang, Xu Tang, Weijie Yu, Jun Xu

TL;DR
This paper introduces LJPIV, a benchmark dataset for legal judgment prediction that includes innocent verdicts, and demonstrates that current legal LLMs need significant improvement in trichotomous reasoning, especially for innocent outcomes.
Contribution
The paper creates the first benchmark dataset, LJPIV, for legal judgment prediction with innocent verdicts, and proposes strategies to improve LLMs' trichotomous reasoning capabilities.
Findings
Current legal LLMs have low F1 scores (<0.3) on LJPIV.
Strategies improve judgment prediction accuracy, especially for innocent verdicts.
Significant room for improvement in legal LLMs' reasoning abilities.
Abstract
In legal practice, judges apply the trichotomous dogmatics of criminal law, sequentially assessing the elements of the offense, unlawfulness, and culpability to determine whether an individual's conduct constitutes a crime. Although current legal large language models (LLMs) show promising accuracy in judgment prediction, they lack trichotomous reasoning capabilities due to the absence of an appropriate benchmark dataset, preventing them from predicting innocent outcomes. As a result, every input is automatically assigned a charge, limiting their practical utility in legal contexts. To bridge this gap, we introduce LJPIV, the first benchmark dataset for Legal Judgment Prediction with Innocent Verdicts. Adhering to the trichotomous dogmatics, we extend three widely-used legal datasets through LLM-based augmentation and manual verification. Our experiments with state-of-the-art legal LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Law, Economics, and Judicial Systems · Legal Education and Practice Innovations
