Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library Recommender Systems
Andrew Collins, Joeran Beel

TL;DR
This study compares the online effectiveness of document embeddings, keyphrases, and terms in digital library recommender systems, revealing significant performance differences across platforms and highlighting the importance of context-specific algorithm selection.
Contribution
It provides the first large-scale online evaluation comparing multiple recommendation algorithms in digital libraries, demonstrating their varying effectiveness across different platforms.
Findings
Significant performance differences between algorithms across platforms.
Best algorithm varies depending on the digital library platform.
Approximately 400% effectiveness variation between best and worst algorithms.
Abstract
Many recommendation algorithms are available to digital library recommender system operators. The effectiveness of algorithms is largely unreported by way of online evaluation. We compare a standard term-based recommendation approach to two promising approaches for related-article recommendation in digital libraries: document embeddings, and keyphrases. We evaluate the consistency of their performance across multiple scenarios. Through our recommender-as-a-service Mr. DLib, we delivered 33.5M recommendations to users of Sowiport and Jabref over the course of 19 months, from March 2017 to October 2018. The effectiveness of the algorithms differs significantly between Sowiport and Jabref (Wilcoxon rank-sum test; p < 0.05). There is a ~400% difference in effectiveness between the best and worst algorithm in both scenarios separately. The best performing algorithm in Sowiport (terms) is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Information Retrieval and Search Behavior · Music and Audio Processing
