Scientific Dataset Discovery via Topic-level Recommendation
Basmah Altaf, Shichao Pei, Xiangliang Zhang

TL;DR
This paper introduces a topic-based recommendation system for discovering relevant datasets in research by modeling papers and datasets in a shared latent topic space, improving dataset discovery efficiency.
Contribution
It proposes a novel topic-level recommendation approach on an attributed heterogeneous graph, moving beyond traditional graph embedding methods for dataset discovery.
Findings
The model effectively generates dataset profiles.
It accurately recommends datasets for research projects.
Experimental results validate the approach's effectiveness.
Abstract
Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed heterogeneous graph, which is composed of paper-paper citation, paper-dataset citation, and also paper content. We propose to characterize both paper and dataset nodes by their commonly shared latent topics, rather than learning user and item representations via canonical graph embedding models, because the usage of datasets and the themes of research projects can be understood on the common base of research topics. The relevant datasets to a given research project can then be inferred in the shared topic space. The experimental results show that our model can generate reasonable profiles for datasets, and recommend proper datasets for a query, which represents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Recommender Systems and Techniques · Topic Modeling
