The Quest for Reliable Metrics of Responsible AI
Theresia Veronika Rampisela, Maria Maistro, Tuukka Ruotsalo, Christina Lioma

TL;DR
This paper discusses the importance of developing and assessing reliable evaluation metrics for responsible AI, emphasizing the need for robustness and proposing guidelines applicable across various AI domains.
Contribution
It provides a set of non-exhaustive guidelines for creating robust and reliable metrics of responsible AI, based on analysis of fairness metrics in recommender systems.
Findings
Identified key challenges in metric robustness
Proposed guidelines for reliable metric development
Emphasized broad applicability across AI domains
Abstract
The development of Artificial Intelligence (AI), including AI in Science (AIS), should be done following the principles of responsible AI. Progress in responsible AI is often quantified through evaluation metrics, yet there has been less work on assessing the robustness and reliability of the metrics themselves. We reflect on prior work that examines the robustness of fairness metrics for recommender systems as a type of AI application and summarise their key takeaways into a set of non-exhaustive guidelines for developing reliable metrics of responsible AI. Our guidelines apply to a broad spectrum of AI applications, including AIS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
