The influence of time and discipline on the magnitude of correlations between citation counts and quality scores
Mike Thelwall, Ruth Fairclough

TL;DR
This study uses simulations to show how mixing data from different disciplines or years can significantly weaken the observed correlation between citation counts and quality scores, affecting research evaluation validity.
Contribution
It systematically investigates how heterogeneity in data sets impacts correlation strength, highlighting the importance of data homogeneity in citation-based assessments.
Findings
Mixing different disciplines or years reduces correlation magnitude.
Even similar correlation strengths can be diminished by mean citation differences.
Pre-selection for high quality and the nature of the relationship affect correlation reduction.
Abstract
Although various citation-based indicators are commonly used to help research evaluations, there are ongoing controversies about their value. In response, they are often correlated with quality ratings or with other quantitative indicators in order to partly assess their validity. When correlations are calculated for sets of publications from multiple disciplines or years, however, the magnitude of the correlation coefficient may be reduced, masking the strength of the underlying correlation. In response, this article uses simulations to systematically investigate the extent to which mixing years or disciplines reduces correlations. The results show that mixing two sets of articles with different correlation strengths can reduce the correlation for the combined set to substantially below the average of the two. Moreover, even mixing two sets of articles with the same correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
