Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned
Stefan Langer, Joeran Beel

TL;DR
This paper shares three practical lessons learned from using Apache Lucene as a content-based filtering recommender in a scholarly literature system, highlighting how relevance scores, recommendation selection, and search result quantity impact user engagement.
Contribution
The paper provides empirical insights into optimizing Lucene-based recommenders by analyzing relevance scores, recommendation sampling, and search result volume to improve click-through rates.
Findings
Relevance scores above 0.025 significantly increase click-through rates.
Randomly selecting recommendations from top results reduces effectiveness.
Fewer than 1,000 search results correlates with lower click-through rates.
Abstract
For the past few years, we used Apache Lucene as recommendation frame-work in our scholarly-literature recommender system of the reference-management software Docear. In this paper, we share three lessons learned from our work with Lucene. First, recommendations with relevance scores below 0.025 tend to have significantly lower click-through rates than recommendations with relevance scores above 0.025. Second, by picking ten recommendations randomly from Lucene's top50 search results, click-through rate decreased by 15%, compared to recommending the top10 results. Third, the number of returned search results tend to predict how high click-through rates will be: when Lucene returns less than 1,000 search results, click-through rates tend to be around half as high as if 1,000+ results are returned.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Recommender Systems and Techniques · Information Retrieval and Search Behavior
