TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval
\"Ozay Ezerceli, Mahmoud El Hussieni, Selva Ta\c{s}, Reyhan Bayraktar, Fatma Bet\"ul Terzio\u{g}lu, Yusuf \c{C}elebi, Ya\u{g}{\i}z Asker

TL;DR
This paper introduces TurkColBERT, a comprehensive benchmark comparing dense and late-interaction models for Turkish information retrieval, demonstrating that smaller, late-interaction models can outperform larger dense encoders in efficiency and accuracy.
Contribution
It systematically evaluates and compares dense and late-interaction models for Turkish IR, proposing a two-stage adaptation pipeline and demonstrating significant parameter efficiency and performance improvements.
Findings
Late-interaction models outperform dense encoders in Turkish IR.
A small ColBERT-style model retains over 71% of larger model performance.
MUVERA+Rerank indexing is faster and slightly more accurate.
Abstract
Neural information retrieval systems excel in high-resource languages but remain underexplored for morphologically rich, lower-resource languages such as Turkish. Dense bi-encoders currently dominate Turkish IR, yet late-interaction models -- which retain token-level representations for fine-grained matching -- have not been systematically evaluated. We introduce TurkColBERT, the first comprehensive benchmark comparing dense encoders and late-interaction models for Turkish retrieval. Our two-stage adaptation pipeline fine-tunes English and multilingual encoders on Turkish NLI/STS tasks, then converts them into ColBERT-style retrievers using PyLate trained on MS MARCO-TR. We evaluate 10 models across five Turkish BEIR datasets covering scientific, financial, and argumentative domains. Results show strong parameter efficiency: the 1.0M-parameter colbert-hash-nano-tr is 600 smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Natural Language Processing Techniques
