Holistic Safety and Responsibility Evaluations of Advanced AI Models
Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina, Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona, Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev, Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan

TL;DR
This paper discusses the development and application of comprehensive safety and responsibility evaluation methods for advanced AI models, emphasizing theoretical frameworks, collaboration, and the need for a unified evaluation ecosystem.
Contribution
It introduces a broad set of safety evaluation approaches, highlights lessons learned, and advocates for collaborative efforts and scientific advancement in AI safety evaluation.
Findings
Theoretical frameworks are essential for organizing safety risk domains.
Collaboration across stakeholders enhances safety evaluation development.
Unified safety evaluation practices are crucial across different AI risk concerns.
Abstract
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety -…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Risk and Safety Analysis
MethodsSparse Evolutionary Training
