Robustness to fundamental uncertainty in AGI alignment

G Gordon Worley III

arXiv:1807.09836·cs.AI·February 18, 2020

Robustness to fundamental uncertainty in AGI alignment

G Gordon Worley III

PDF

Open Access

TL;DR

This paper discusses how to improve AGI alignment robustness by managing fundamental uncertainties, advocating for cautious assumptions to avoid false positives that could lead to catastrophic failure.

Contribution

It introduces a framework for handling key philosophical and scientific uncertainties in AGI alignment to reduce false positives and enhance safety.

Findings

01

Identifies meta-ethical and mental phenomena uncertainties as critical to AGI alignment.

02

Proposes strategies to limit assumptions and mitigate false positives.

03

Highlights importance of cautious research policies in high-stakes AI development.

Abstract

The AGI alignment problem has a bimodal distribution of outcomes with most outcomes clustering around the poles of total success and existential, catastrophic failure. Consequently, attempts to solve AGI alignment should, all else equal, prefer false negatives (ignoring research programs that would have been successful) to false positives (pursuing research programs that will unexpectedly fail). Thus, we propose adopting a policy of responding to points of philosophical and practical uncertainty associated with the alignment problem by limiting and choosing necessary assumptions to reduce the risk of false positives. Herein we explore in detail two relevant points of uncertainty that AGI alignment research hinges on---meta-ethical uncertainty and uncertainty about mental phenomena---and show how to reduce false positives in response to them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Computability, Logic, AI Algorithms