Improving embedding with contrastive fine-tuning on small datasets with   expert-augmented scores

Jun Lu; David Li; Bill Ding; Yu Kang

arXiv:2408.11868·cs.CL·August 23, 2024

Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores

Jun Lu, David Li, Bill Ding, Yu Kang

PDF

Open Access

TL;DR

This paper introduces a contrastive fine-tuning method using expert-augmented scores to improve text embeddings on small datasets, enhancing semantic similarity and retrieval tasks with practical, cost-effective gains.

Contribution

It proposes a novel contrastive fine-tuning approach leveraging expert scores to enhance embedding models in low-data scenarios, maintaining versatility and improving retrieval performance.

Findings

01

Improved performance on semantic textual similarity tasks.

02

Enhanced retrieval accuracy across multiple benchmarks.

03

Cost-effective method suitable for real-world applications.

Abstract

This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\&A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems