Turkish Text Retrieval Experiments Using Lemur Toolkit
Kutlu Emre Y{\i}lmaz, Ahmet Arslan, Ozgur Yilmazel

TL;DR
This paper evaluates Turkish text retrieval using the Lemur Toolkit, demonstrating that language-specific preprocessing improves retrieval quality and that the Language Modeling approach performs best with such preprocessing.
Contribution
It is the first to compare multiple retrieval models on Turkish using Lemur Toolkit and highlights the effectiveness of language-specific preprocessing techniques.
Findings
Language-specific preprocessing improves retrieval quality
Language Modeling approach outperforms other models with preprocessing
All models benefit from preprocessing techniques
Abstract
We used Lemur Toolkit, an open source toolkit designed for Information Retrieval (IR) research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish. We study and compare three retrieval models Lemur supports, especially Language modeling approach to IR, combined with language specific preprocessing techniques. Our experiments show that all retrieval models benefits from language specific preprocessing in terms of retrieval quality. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
