Turkish Text Retrieval Experiments Using Lemur Toolkit

Kutlu Emre Y{\i}lmaz; Ahmet Arslan; Ozgur Yilmazel

arXiv:1405.1740·cs.IR·May 9, 2014

Turkish Text Retrieval Experiments Using Lemur Toolkit

Kutlu Emre Y{\i}lmaz, Ahmet Arslan, Ozgur Yilmazel

PDF

Open Access

TL;DR

This paper evaluates Turkish text retrieval using the Lemur Toolkit, demonstrating that language-specific preprocessing improves retrieval quality and that the Language Modeling approach performs best with such preprocessing.

Contribution

It is the first to compare multiple retrieval models on Turkish using Lemur Toolkit and highlights the effectiveness of language-specific preprocessing techniques.

Findings

01

Language-specific preprocessing improves retrieval quality

02

Language Modeling approach outperforms other models with preprocessing

03

All models benefit from preprocessing techniques

Abstract

We used Lemur Toolkit, an open source toolkit designed for Information Retrieval (IR) research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish. We study and compare three retrieval models Lemur supports, especially Language modeling approach to IR, combined with language specific preprocessing techniques. Our experiments show that all retrieval models benefits from language specific preprocessing in terms of retrieval quality. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques