TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

\"Ozay Ezerceli; Gizem G\"um\"u\c{s}\c{c}eki\c{c}ci; Tu\u{g}ba Erko\c{c}; Berke \"Ozen\c{c}

arXiv:2511.07595·cs.IR·November 12, 2025

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

\"Ozay Ezerceli, Gizem G\"um\"u\c{s}\c{c}eki\c{c}ci, Tu\u{g}ba Erko\c{c}, Berke \"Ozen\c{c}

PDF

Open Access

TL;DR

TurkEmbed4Retrieval is a specialized Turkish language embedding model optimized for retrieval tasks, achieving state-of-the-art performance by fine-tuning on the MS MARCO TR dataset with advanced training methods.

Contribution

The paper introduces TurkEmbed4Retrieval, a novel retrieval-focused Turkish embedding model that surpasses previous models like Turkish colBERT in retrieval benchmarks.

Findings

01

Outperforms Turkish colBERT by 19-26% on key metrics

02

Establishes new benchmark for Turkish retrieval tasks

03

Demonstrates effectiveness of advanced training techniques

Abstract

In this work, we introduce TurkEmbed4Retrieval, a retrieval specialized variant of the TurkEmbed model originally designed for Natural Language Inference (NLI) and Semantic Textual Similarity (STS) tasks. By fine-tuning the base model on the MS MARCO TR dataset using advanced training techniques, including Matryoshka representation learning and a tailored multiple negatives ranking loss, we achieve SOTA performance for Turkish retrieval tasks. Extensive experiments demonstrate that our model outperforms Turkish colBERT by 19,26% on key retrieval metrics for the Scifact TR dataset, thereby establishing a new benchmark for Turkish information retrieval.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks