Large-Language Memorization During the Classification of United States Supreme Court Cases
John E. Ortega, Dhruv D. Joshi, Matt P. Borkowski

TL;DR
This paper investigates how large-language models memorize and classify complex US Supreme Court decisions, comparing different fine-tuning and retrieval methods to improve accuracy and robustness in legal domain tasks.
Contribution
It introduces a detailed analysis of LLM memorization in legal classification tasks and demonstrates that prompt-based models with memory outperform previous BERT-based approaches.
Findings
Prompt-based models with memories outperform BERT-based models by about 2 points.
DeepSeek shows increased robustness in legal classification tasks.
Complex legal language challenges LLM memorization and classification accuracy.
Abstract
Large-language models (LLMs) have been shown to respond in a variety of ways for classification tasks outside of question-answering. LLM responses are sometimes called "hallucinations" since the output is not what is ex pected. Memorization strategies in LLMs are being studied in detail, with the goal of understanding how LLMs respond. We perform a deep dive into a classification task based on United States Supreme Court (SCOTUS) decisions. The SCOTUS corpus is an ideal classification task to study for LLM memory accuracy because it presents significant challenges due to extensive sentence length, complex legal terminology, non-standard structure, and domain-specific vocabulary. Experimentation is performed with the latest LLM fine tuning and retrieval-based approaches, such as parameter-efficient fine-tuning, auto-modeling, and others, on two traditional category-based SCOTUS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Language and Interpretation · Topic Modeling
