Linq-Embed-Mistral Technical Report
Chanyeol Choi, Junseong Kim, Seolhwa Lee, Jihoon Kwon, Sangmo Gu,, Yejin Kim, Minkyung Cho, Jy-yong Sohn

TL;DR
This paper introduces Linq-Embed-Mistral, a retrieval model built on advanced data refinement and task-specific tuning techniques, achieving top performance on the MTEB benchmarks and demonstrating improved search accuracy and reliability.
Contribution
The paper presents novel data crafting, filtering, and negative mining methods, along with task ordering and fine-tuning strategies, to significantly enhance retrieval model performance and efficiency.
Findings
Achieved an average score of 68.2 on MTEB benchmarks.
Ranked 1st among models for retrieval tasks on MTEB leaderboard.
Utilized 4-bit precision evaluation for faster validation.
Abstract
This report explores the enhancement of text retrieval performance using advanced data refinement techniques. We develop Linq-Embed-Mistral\footnote{\url{https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral}} by building on the E5-mistral and Mistral-7B-v0.1 models, focusing on sophisticated data crafting, data filtering, and negative mining methods, which are highly tailored to each task, applied to both existing benchmark dataset and highly tailored synthetic dataset generated via large language models (LLMs). Linq-Embed-Mistral excels in the MTEB benchmarks (as of May 29, 2024), achieving an average score of 68.2 across 56 datasets, and ranks 1st among all models for retrieval tasks on the MTEB leaderboard with a performance score of 60.2. This performance underscores its superior capability in enhancing search precision and reliability. Our contributions include advanced data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Natural Language Processing Techniques
