Do they agree? Bibliometric evaluation vs informed peer review in the Italian research assessment exercise
Alberto Baccini, Giuseppe De Nicolao

TL;DR
This study critically re-evaluates the agreement between peer review and bibliometric methods in Italy's research assessment, revealing generally poor agreement and questioning the combined use of both methods.
Contribution
It provides a rigorous statistical re-analysis of the Italian research assessment experiment, highlighting flaws and the lack of agreement between evaluation methods.
Findings
Poor agreement across most fields based on kappa values
Moderate agreement only in economics and statistics
Experiment protocol flaws in Area 13
Abstract
During the Italian research assessment exercise, the national agency ANVUR performed an experiment to assess agreement between grades attributed to journal articles by informed peer review (IR) and by bibliometrics. A sample of articles was evaluated by using both methods and agreement was analyzed by weighted Cohen's kappas. ANVUR presented results as indicating an overall 'good' or 'more than adequate' agreement. This paper re-examines the experiment results according to the available statistical guidelines for interpreting kappa values, by showing that the degree of agreement, always in the range 0.09-0.42 has to be interpreted, for all research fields, as unacceptable, poor or, in a few cases, as, at most, fair. The only notable exception, confirmed also by a statistical meta-analysis, was a moderate agreement for economics and statistics (Area 13) and its sub-fields. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
