Efficient fine-tuning methodology of text embedding models for   information retrieval: contrastive learning penalty (clp)

Jeongsu Yu

arXiv:2412.17364·cs.IR·December 24, 2024

Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp)

Jeongsu Yu

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces a novel fine-tuning approach for text embedding models using a Contrastive Learning Penalty, significantly improving document retrieval performance in information retrieval systems.

Contribution

The study proposes a new Contrastive Learning Penalty function and an optimized fine-tuning methodology for text embedding models, enhancing retrieval accuracy.

Findings

01

Significant performance improvements in document retrieval tasks.

02

Effective over existing contrastive learning methods.

03

Open-source code and models available for replication.

Abstract

Text embedding models play a crucial role in natural language processing, particularly in information retrieval, and their importance is further highlighted with the recent utilization of RAG (Retrieval- Augmented Generation). This study presents an efficient fine-tuning methodology encompassing data selection, loss function, and model architecture to enhance the information retrieval performance of pre-trained text embedding models. In particular, this study proposes a novel Contrastive Learning Penalty function that overcomes the limitations of existing Contrastive Learning. The proposed methodology achieves significant performance improvements over existing methods in document retrieval tasks. This study is expected to contribute to improving the performance of information retrieval systems through fine-tuning of text embedding models. The code for this study can be found at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

crealabs/enhanced-bge-m3-with-clp-and-moe
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Recommender Systems and Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout · Softmax