SciNUP: Natural Language User Interest Profiles for Scientific Literature Recommendation
Mariam Arustashvili, Krisztian Balog

TL;DR
SciNUP introduces a synthetic dataset for scientific literature recommendation using natural language user profiles, enabling evaluation of various retrieval methods and highlighting the potential for future improvements.
Contribution
The paper presents SciNUP, a large-scale synthetic dataset for NL profile-based scholarly recommendation, filling a key gap for benchmarking and research.
Findings
Baseline methods perform similarly but retrieve different items.
Significant potential for improving NL-based recommendation methods.
The dataset facilitates future research in this area.
Abstract
The use of natural language (NL) user profiles in recommender systems offers greater transparency and user control compared to traditional representations. However, there is scarcity of large-scale, publicly available test collections for evaluating NL profile-based recommendation. To address this gap, we introduce SciNUP, a novel synthetic dataset for scholarly recommendation that leverages authors' publication histories to generate NL profiles and corresponding ground truth items. We use this dataset to conduct a comparison of baseline methods, ranging from sparse and dense retrieval approaches to state-of-the-art LLM-based rerankers. Our results show that while baseline methods achieve comparable performance, they often retrieve different items, indicating complementary behaviors. At the same time, considerable headroom for improvement remains, highlighting the need for effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Advanced Graph Neural Networks
