The Quest for Reliable Metrics of Responsible AI

Theresia Veronika Rampisela; Maria Maistro; Tuukka Ruotsalo; Christina Lioma

arXiv:2510.26007·cs.CY·October 31, 2025

The Quest for Reliable Metrics of Responsible AI

Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

PDF

TL;DR

This paper discusses the importance of developing and assessing reliable evaluation metrics for responsible AI, emphasizing the need for robustness and proposing guidelines applicable across various AI domains.

Contribution

It provides a set of non-exhaustive guidelines for creating robust and reliable metrics of responsible AI, based on analysis of fairness metrics in recommender systems.

Findings

01

Identified key challenges in metric robustness

02

Proposed guidelines for reliable metric development

03

Emphasized broad applicability across AI domains

Abstract

The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified through evaluation metrics, yet there has been less work on assessing the robustness and reliability of the metrics themselves. We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application and summarise their key takeaways into a set of non-exhaustive guidelines for developing reliable metrics of responsible AI. Our guidelines apply to a broad spectrum of AI applications, including AIS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.