Adaptive Two-Phase Finetuning LLMs for Japanese Legal Text Retrieval
Quang Hoang Trung, Nguyen Van Hoang Phuc, Le Trung Hoang, Quang Huu, Hieu, Vo Nguyen Le Duy

TL;DR
This paper presents a novel two-phase fine-tuning approach for large language models to improve Japanese legal text retrieval, demonstrating superior performance and adaptability in both Japanese and English contexts.
Contribution
Introduces a new dataset and a two-phase fine-tuning pipeline specifically designed for Japanese legal text retrieval, outperforming existing baselines.
Findings
Outperforms existing baselines in Japanese legal text retrieval.
Effective in English contexts, surpassing MS MARCO benchmarks.
Code and models are publicly available for reproducibility.
Abstract
Text Retrieval (TR) involves finding and retrieving text-based content relevant to a user's query from a large repository, with applications in real-world scenarios such as legal document retrieval. While most existing studies focus on English, limited work addresses Japanese contexts. In this paper, we introduce a new dataset specifically designed for Japanese legal contexts and propose a novel two-phase pipeline tailored to this domain. In the first phase, the model learns a broad understanding of global contexts, enhancing its generalization and adaptability to diverse queries. In the second phase, the model is fine-tuned to address complex queries specific to legal scenarios. Extensive experiments are conducted to demonstrate the superior performance of our method, which outperforms existing baselines. Furthermore, our pipeline proves effective in English contexts, surpassing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsFocus
