Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores
Jun Lu, David Li, Bill Ding, Yu Kang

TL;DR
This paper introduces a contrastive fine-tuning method using expert-augmented scores to improve text embeddings on small datasets, enhancing semantic similarity and retrieval tasks with practical, cost-effective gains.
Contribution
It proposes a novel contrastive fine-tuning approach leveraging expert scores to enhance embedding models in low-data scenarios, maintaining versatility and improving retrieval performance.
Findings
Improved performance on semantic textual similarity tasks.
Enhanced retrieval accuracy across multiple benchmarks.
Cost-effective method suitable for real-world applications.
Abstract
This paper presents an approach to improve text embedding models through contrastive fine-tuning on small datasets augmented with expert scores. It focuses on enhancing semantic textual similarity tasks and addressing text retrieval problems. The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models, preserving their versatility and ensuring retrieval capability is improved. The paper evaluates the method using a Q\&A dataset from an online shopping website and eight expert models. Results show improved performance over a benchmark model across multiple metrics on various retrieval tasks from the massive text embedding benchmark (MTEB). The method is cost-effective and practical for real-world applications, especially when labeled data is scarce.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
