LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Serene Wang; Lavanya Pobbathi; Haihua Chen

arXiv:2603.08286·cs.CL·March 10, 2026

LAMUS: A Large-Scale Corpus for Legal Argument Mining from U.S. Caselaw using LLMs

Serene Wang, Lavanya Pobbathi, Haihua Chen

PDF

Open Access

TL;DR

LAMUS is a large-scale, high-quality dataset for legal argument mining from U.S. caselaw, created using LLMs and human verification, enabling improved NLP research in legal analysis.

Contribution

The paper introduces LAMUS, a novel large-scale legal argument mining corpus with a data-centric pipeline combining LLMs and human refinement, addressing data scarcity in legal NLP.

Findings

01

Chain-of-thought prompting improves LLM performance.

02

Domain-specific models show stable zero-shot behavior.

03

Human verification achieves high annotation consistency.

Abstract

Legal argument mining aims to identify and classify the functional components of judicial reasoning, such as facts, issues, rules, analysis, and conclusions. Progress in this area is limited by the lack of large-scale, high-quality annotated datasets for U.S. caselaw, particularly at the state level. This paper introduces LAMUS, a sentence-level legal argument mining corpus constructed from U.S. Supreme Court decisions and Texas criminal appellate opinions. The dataset is created using a data-centric pipeline that combines large-scale case collection, LLM-based automatic annotation, and targeted human-in-the-loop quality refinement. We formulate legal argument mining as a six-class sentence classification task and evaluate multiple general-purpose and legal-domain language models under zero-shot, few-shot, and chain-of-thought prompting strategies, with LegalBERT as a supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Legal Language and Interpretation · Multi-Agent Systems and Negotiation