Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language Models
Hai-Long Nguyen, Duc-Minh Nguyen, Tan-Minh Nguyen, Ha-Thanh Nguyen,, Thi-Hai-Yen Vuong, Ken Satoh

TL;DR
This paper proposes a multi-phase legal document retrieval system that combines BM25, BERT re-ranking, and large language model prompting, significantly improving accuracy on the COLIEE 2023 dataset.
Contribution
It introduces a novel multi-phase retrieval framework that effectively integrates prompting with traditional retrieval methods for legal documents.
Findings
Enhanced retrieval accuracy with LLM prompting
Effective combination of BM25, BERT, and LLMs
Identified challenges in current retrieval systems
Abstract
Large language models with billions of parameters, such as GPT-3.5, GPT-4, and LLaMA, are increasingly prevalent. Numerous studies have explored effective prompting techniques to harness the power of these LLMs for various research problems. Retrieval, specifically in the legal data domain, poses a challenging task for the direct application of Prompting techniques due to the large number and substantial length of legal articles. This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system, preceded by the support of two phases: BM25 Pre-ranking and BERT-based Re-ranking. Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy. However, error analysis reveals several existing issues in the retrieval system that still need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Transformer · Attention Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines · Residual Connection
