Optimizing Multi-Stage Language Models for Effective Text Retrieval

Quang Hoang Trung; Le Trung Hoang; Nguyen Van Hoang Phuc

arXiv:2412.19265·cs.IR·December 30, 2024

Optimizing Multi-Stage Language Models for Effective Text Retrieval

Quang Hoang Trung, Le Trung Hoang, Nguyen Van Hoang Phuc

PDF

Open Access

TL;DR

This paper presents a two-phase, ensemble-based text retrieval system optimized for Japanese legal datasets, achieving state-of-the-art accuracy and efficiency in domain-specific and multilingual contexts.

Contribution

Introduces a novel two-phase retrieval pipeline with ensemble strategies tailored for Japanese legal texts, improving performance over existing methods.

Findings

01

Achieves state-of-the-art retrieval accuracy on Japanese legal datasets.

02

Demonstrates significant efficiency improvements in retrieval tasks.

03

Validates robustness and adaptability across diverse benchmarks like MS-MARCO.

Abstract

Efficient text retrieval is critical for applications such as legal document analysis, particularly in specialized contexts like Japanese legal systems. Existing retrieval methods often underperform in such domain-specific scenarios, necessitating tailored approaches. In this paper, we introduce a novel two-phase text retrieval pipeline optimized for Japanese legal datasets. Our method leverages advanced language models to achieve state-of-the-art performance, significantly improving retrieval efficiency and accuracy. To further enhance robustness and adaptability, we incorporate an ensemble model that integrates multiple retrieval strategies, resulting in superior outcomes across diverse tasks. Extensive experiments validate the effectiveness of our approach, demonstrating strong performance on both Japanese legal datasets and widely recognized benchmarks like MS-MARCO. Our work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling