Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study

Jiuzhou Han; Paul Burgess; Ehsan Shareghi

arXiv:2412.06272·cs.CL·May 23, 2025

Evaluating LLM-based Approaches to Legal Citation Prediction: Domain-specific Pre-training, Fine-tuning, or RAG? A Benchmark and an Australian Law Case Study

Jiuzhou Han, Paul Burgess, Ehsan Shareghi

PDF

Open Access 3 Models 1 Datasets

TL;DR

This paper introduces the AusLaw Citation Benchmark, a large dataset for legal citation prediction, and systematically evaluates various LLM-based approaches, revealing the importance of hybrid retrieval methods and highlighting significant performance gaps.

Contribution

The paper presents the first large-scale Australian legal citation dataset and provides a comprehensive benchmark for LLM-based citation prediction methods.

Findings

01

Hybrid retrieval approaches outperform standalone LLMs.

02

Instruction tuning improves performance significantly.

03

A 50% performance gap remains, indicating room for future research.

Abstract

Large Language Models (LLMs) have demonstrated strong potential across legal tasks, yet the problem of legal citation prediction remains under-explored. At its core, this task demands fine-grained contextual understanding and precise identification of relevant legislation or precedent. We introduce the AusLaw Citation Benchmark, a real-world dataset comprising 55k Australian legal instances and 18,677 unique citations which to the best of our knowledge is the first of its scale and scope. We then conduct a systematic benchmarking across a range of solutions: (i) standard prompting of both general and law-specialised LLMs, (ii) retrieval-only pipelines with both generic and domain-specific embeddings, (iii) supervised fine-tuning, and (iv) several hybrid strategies that combine LLMs with retrieval augmentation through query expansion, voting ensembles, or re-ranking. Results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

auslawbench/AusLaw-Citation-Benchmark
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Comparative and International Law Studies

MethodsFocus