Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016
Vlad Sandulescu, Mihai Chiru

TL;DR
This paper presents a machine learning approach to predict the future relevance of research institutions by analyzing publication data, aiming to provide a transparent ranking method based on future accepted papers.
Contribution
It introduces a comprehensive framework combining probabilistic, feature engineering, and gradient boosting models to forecast institutional impact using academic publication data.
Findings
Gradient boosted decision trees improved prediction accuracy.
Feature engineering significantly enhanced model performance.
The approach offers a transparent ranking of research institutions.
Abstract
The world's collective knowledge is evolving through research and new scientific discoveries. It is becoming increasingly difficult to objectively rank the impact research institutes have on global advancements. However, since the funding, governmental support, staff and students quality all mirror the projected quality of the institution, it becomes essential to measure the affiliation's rating in a transparent and widely accepted way. We propose and investigate several methods to rank affiliations based on the number of their accepted papers at future academic conferences. We carry out our investigation using publicly available datasets such as the Microsoft Academic Graph, a heterogeneous graph which contains various information about academic papers. We analyze several models, starting with a simple probabilities-based method and then gradually expand our training dataset, engineer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDelphi Technique in Research
