Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement
Rosamond Thalken, Edward H. Stiglitz, David Mimno, and Matthew Wilkens

TL;DR
This paper evaluates the performance of language models on a complex legal reasoning classification task, highlighting the importance of fine-tuning and human annotation for accurate results in specialized domains.
Contribution
It introduces a novel dataset of Supreme Court opinions annotated by experts and systematically tests various language models, emphasizing the effectiveness of fine-tuning over prompt-based approaches.
Findings
Fine-tuned models outperform prompt-based models on complex legal tasks.
LEGAL-BERT achieves the best performance among tested models.
Generative models perform poorly without task-specific fine-tuning.
Abstract
Generative language models (LMs) are increasingly used for document class-prediction tasks and promise enormous improvements in cost and efficiency. Existing research often examines simple classification tasks, but the capability of LMs to classify on complex or specialized tasks is less well understood. We consider a highly complex task that is challenging even for humans: the classification of legal reasoning according to jurisprudential philosophy. Using a novel dataset of historical United States Supreme Court opinions annotated by a team of domain experts, we systematically test the performance of a variety of LMs. We find that generative models perform poorly when given instructions (i.e. prompts) equal to the instructions presented to human annotators through our codebook. Our strongest results derive from fine-tuning models on the annotated dataset; the best performing model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Computational and Text Analysis Methods · Legal Education and Practice Innovations
