TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task
\"Ozay Ezerceli, Gizem G\"um\"u\c{s}\c{c}eki\c{c}ci, Tu\u{g}ba Erko\c{c}, Berke \"Ozen\c{c}

TL;DR
TurkEmbed4Retrieval is a specialized Turkish language embedding model optimized for retrieval tasks, achieving state-of-the-art performance by fine-tuning on the MS MARCO TR dataset with advanced training methods.
Contribution
The paper introduces TurkEmbed4Retrieval, a novel retrieval-focused Turkish embedding model that surpasses previous models like Turkish colBERT in retrieval benchmarks.
Findings
Outperforms Turkish colBERT by 19-26% on key metrics
Establishes new benchmark for Turkish retrieval tasks
Demonstrates effectiveness of advanced training techniques
Abstract
In this work, we introduce TurkEmbed4Retrieval, a retrieval specialized variant of the TurkEmbed model originally designed for Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By fine-tuning the base model on the MS MARCO TR dataset using advanced training techniques, including Matryoshka representation learning and a tailored multiple negatives ranking loss, we achieve SOTA performance for Turkish retrieval tasks. Extensive experiments demonstrate that our model outperforms Turkish colBERT by 19,26% on key retrieval metrics for the Scifact TR dataset, thereby establishing a new benchmark for Turkish information retrieval.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks
