Towards Robust Text Retrieval with Progressive Learning

Tong Wu; Yulei Qin; Enwei Zhang; Zihan Xu; Yuting Gao; Ke Li; Xing Sun

arXiv:2311.11691·cs.IR·November 21, 2023·1 cites

Towards Robust Text Retrieval with Progressive Learning

Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li, Xing Sun

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces PEG, a progressive learning-based embedding model for robust text retrieval that scales training data, incorporates hard negatives, and dynamically adjusts focus during training, outperforming existing models across multiple domains.

Contribution

The paper proposes PEG, a novel embedding training method with increased negative samples, hard negatives, and a dynamic learning mechanism, improving retrieval robustness and generalization.

Findings

01

PEG outperforms state-of-the-art embeddings on C-MTEB and DuReader benchmarks.

02

Training on over 100 million diverse data improves domain coverage.

03

Progressive learning enhances embedding quality and retrieval accuracy.

Abstract

Retrieval augmentation has become an effective solution to empower large language models (LLMs) with external and verified knowledge sources from the database, which overcomes the limitations and hallucinations of LLMs in handling up-to-date and domain-specific information. However, existing embedding models for text retrieval usually have three non-negligible limitations. First, the number and diversity of samples in a batch are too restricted to supervise the modeling of textual nuances at scale. Second, the high proportional noise are detrimental to the semantic correctness and consistency of embeddings. Third, the equal treatment to easy and difficult samples would cause sub-optimum convergence of embeddings with poorer generalization. In this paper, we propose the PEG, a progressively learned embeddings for robust text retrieval. Specifically, we increase the training in-batch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/TownsWu/PEG
noneOfficial

Models

🤗
TownsWu/PEG
model· 24 dl· ♡ 29
24 dl♡ 29

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks