Auditing automated research assessment: an interpretable machine learning approach to validate funding criteria

Rafael P. Gouveia; Thiago C. Silva; Diego R. Amancio

arXiv:2604.09827·cs.DL·April 14, 2026

Auditing automated research assessment: an interpretable machine learning approach to validate funding criteria

Rafael P. Gouveia, Thiago C. Silva, Diego R. Amancio

PDF

TL;DR

This study uses interpretable machine learning to evaluate the validity of official research assessment criteria, revealing that only a subset of criteria significantly influence funding decisions in Brazil.

Contribution

It introduces an empirical, data-driven approach to validate and analyze the actual impact of regulatory criteria on research funding outcomes.

Findings

01

High predictive accuracy (AUC 0.96) of models distinguishing grant levels.

02

Key features like bibliographic output and supervision are most influential.

03

Several regulated criteria show no detectable impact on classification results.

Abstract

This paper empirically examines the practical validity of the official evaluation criteria underpinning the Research Productivity (PQ) Grant framework, as governed by the Brazilian National Council for Scientific and Technological Development (CNPq). By operationalizing regulatory dimensions (including bibliographic output, human resource training, and scientific recognition) as measurable variables extracted from CVs and OpenAlex bibliometric data, we treat policy-defined indicators as testable hypotheses rather than a priori assumptions. Using a block-based adaptation of the Boruta feature selection algorithm across several machine learning classifiers, we evaluate the statistical contribution of each dimension in distinguishing grant levels, with a focus on identifying top-tier (Level 1A) researchers. Our models achieve high predictive performance, with mean AUC scores reaching 0.96,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.