Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability
Franti\v{s}ek Barto\v{s}, Patr\'icia Martinkov\'a

TL;DR
This paper links inter-rater reliability to binary classification metrics to estimate the lower bounds of false positive rates in applicant selection, providing a new approach for evaluating selection procedures.
Contribution
It introduces a method to approximate the probability of correct selection based on IRR, connecting IRR with classification metrics and enabling error bound computations.
Findings
Lower bounds of false positive rates depend solely on IRR and selection proportion.
The approximation method performs well in simulation studies.
The approach is demonstrated with grant review procedures and implemented in an R package.
Abstract
Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the inter-rater reliability and the binary classification metrics, showing that binary classification metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Reliability and Agreement in Measurement
