LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs
Serene Wang, Lavanya Pobbathi, Haihua Chen

TL;DR
LAMUS is a large-scale, high-quality dataset for legal argument mining from U.S. caselaw, created using LLMs and human verification, enabling improved NLP research in legal analysis.
Contribution
The paper introduces LAMUS, a novel large-scale legal argument mining corpus with a data-centric pipeline combining LLMs and human refinement, addressing data scarcity in legal NLP.
Findings
Chain-of-thought prompting improves LLM performance.
Domain-specific models show stable zero-shot behavior.
Human verification achieves high annotation consistency.
Abstract
Legal argument mining aims to identify and classify the functional components of judicial reasoning, such as facts, issues, rules, analysis, and conclusions. Progress in this area is limited by the lack of large-scale, high-quality annotated datasets for U.S. caselaw, particularly at the state level. This paper introduces LAMUS, a sentence-level legal argument mining corpus constructed from U.S. Supreme Court decisions and Texas criminal appellate opinions. The dataset is created using a data-centric pipeline that combines large-scale case collection, LLM-based automatic annotation, and targeted human-in-the-loop quality refinement. We formulate legal argument mining as a six-class sentence classification task and evaluate multiple general-purpose and legal-domain language models under zero-shot, few-shot, and chain-of-thought prompting strategies, with LegalBERT as a supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Language and Interpretation · Multi-Agent Systems and Negotiation
