From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models

Adilson Vital Jr.; Filipi N. Silva; Diego R. Amancio

arXiv:2605.18410·cs.DL·May 19, 2026

From Node2Vec to GPT-based GraphRAG: scientific impact prediction across graph and language models

Adilson Vital Jr., Filipi N. Silva, Diego R. Amancio

PDF

TL;DR

This paper compares graph-based and large language model-based methods for predicting the future impact of scientific papers, finding that combining citation and textual data yields high accuracy, but retrieval augmentation with GPT models offers limited additional benefit.

Contribution

It introduces a unified framework for impact prediction using both graph-based and GPT-based approaches, highlighting the effectiveness of combined signals and the challenges of retrieval augmentation.

Findings

01

Combined citation and textual embeddings achieve ~0.85 AUC.

02

GPT-based GraphRAG achieves ~0.87 AUC with target-only prompts.

03

Retrieval augmentation does not consistently outperform simpler prompts.

Abstract

Identifying which newly published scientific papers are likely to become highly cited is important for prioritizing research attention, supporting editorial decisions, and guiding the allocation of scientific resources, particularly under cold-start conditions where little direct evidence is available at publication time. In this work, we formulate impact prediction as a cohort-normalized top-P% classification task and compare graph-based and LLM-based approaches under a unified framework. We construct citation and textual-similarity graphs under temporal constraints and generate Node2Vec representations, either alone or combined with OpenAI text embeddings. The best supervised configuration combines directed citation graphs with textual embeddings, reaching approximately 0.84-0.85 AUC. We also evaluate a GPT-based GraphRAG setup, using GPT 5.5 and 5.4 Nano, in which graph neighborhoods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.