Towards a general diffusion-based information quality assessment model
Anthony Lopes Temporao, Mickael Tempor\~ao, Corentin Vande Kerckhove, Flavio Abreu Araujo

TL;DR
This paper introduces a diffusion-based, interpretable framework for assessing information quality in academic publications, using diffusion features to predict impact with high accuracy and transparency.
Contribution
It presents a novel, domain-agnostic model leveraging diffusion dynamics and a generalized additive model for scalable, interpretable information quality assessment.
Findings
High correlation (0.834) with future citation gains
Up to 95.62% accuracy in predicting high-impact papers
Timeliness and salience are the most robust predictors
Abstract
The rapid and unregulated dissemination of information in the digital era has amplified the global "infodemic," complicating the identification of high quality information. We present a lightweight, interpretable and non-invasive framework for assessing information quality based solely on diffusion dynamics, demonstrated here in the context of academic publications. Using a heterogeneous dataset of 29,264 sciences, technology, engineering, mathematics (STEM) and social science papers from ArnetMiner and OpenAlex, we model the diffusion network of each paper as a set of three theoretically motivated features: diversity, timeliness, and salience. A Generalized Additive Model (GAM) trained on these features achieved Pearson correlations of 0.834 for next-year citation gain and up to 95.62% accuracy in predicting high-impact papers. Feature relevance studies reveal timeliness and salience…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Web visibility and informetrics
