Optimizing Multi-Stage Language Models for Effective Text Retrieval
Quang Hoang Trung, Le Trung Hoang, Nguyen Van Hoang Phuc

TL;DR
This paper presents a two-phase, ensemble-based text retrieval system optimized for Japanese legal datasets, achieving state-of-the-art accuracy and efficiency in domain-specific and multilingual contexts.
Contribution
Introduces a novel two-phase retrieval pipeline with ensemble strategies tailored for Japanese legal texts, improving performance over existing methods.
Findings
Achieves state-of-the-art retrieval accuracy on Japanese legal datasets.
Demonstrates significant efficiency improvements in retrieval tasks.
Validates robustness and adaptability across diverse benchmarks like MS-MARCO.
Abstract
Efficient text retrieval is critical for applications such as legal document analysis, particularly in specialized contexts like Japanese legal systems. Existing retrieval methods often underperform in such domain-specific scenarios, necessitating tailored approaches. In this paper, we introduce a novel two-phase text retrieval pipeline optimized for Japanese legal datasets. Our method leverages advanced language models to achieve state-of-the-art performance, significantly improving retrieval efficiency and accuracy. To further enhance robustness and adaptability, we incorporate an ensemble model that integrates multiple retrieval strategies, resulting in superior outcomes across diverse tasks. Extensive experiments validate the effectiveness of our approach, demonstrating strong performance on both Japanese legal datasets and widely recognized benchmarks like MS-MARCO. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
