An overview of 11 proposals for building safe advanced AI

Evan Hubinger

arXiv:2012.07532·cs.LG·December 15, 2020·6 cites

An overview of 11 proposals for building safe advanced AI

Evan Hubinger

PDF

Open Access

TL;DR

This paper provides a comparative analysis of 11 proposals for building safe advanced AI, evaluating their strengths and weaknesses across key alignment and performance components to guide future research.

Contribution

It introduces a comprehensive framework for comparing AI safety proposals across four key components, including a novel distinction between training and performance competitiveness.

Findings

01

Evaluates 11 AI safety proposals across four components

02

Introduces the distinction between training and performance competitiveness

03

Provides insights into the relative strengths and weaknesses of each proposal

Abstract

This paper analyzes and compares 11 different proposals for building safe advanced AI under the current machine learning paradigm, including major contenders such as iterated amplification, AI safety via debate, and recursive reward modeling. Each proposal is evaluated on the four components of outer alignment, inner alignment, training competitiveness, and performance competitiveness, of which the distinction between the latter two is introduced in this paper. While prior literature has primarily focused on analyzing individual proposals, or primarily focused on outer alignment at the expense of inner alignment, this analysis seeks to take a comparative look at a wide range of proposals including a comparative analysis across all four previously mentioned components.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Software Engineering Research