Robustness to fundamental uncertainty in AGI alignment
G Gordon Worley III

TL;DR
This paper discusses how to improve AGI alignment robustness by managing fundamental uncertainties, advocating for cautious assumptions to avoid false positives that could lead to catastrophic failure.
Contribution
It introduces a framework for handling key philosophical and scientific uncertainties in AGI alignment to reduce false positives and enhance safety.
Findings
Identifies meta-ethical and mental phenomena uncertainties as critical to AGI alignment.
Proposes strategies to limit assumptions and mitigate false positives.
Highlights importance of cautious research policies in high-stakes AI development.
Abstract
The AGI alignment problem has a bimodal distribution of outcomes with most outcomes clustering around the poles of total success and existential, catastrophic failure. Consequently, attempts to solve AGI alignment should, all else equal, prefer false negatives (ignoring research programs that would have been successful) to false positives (pursuing research programs that will unexpectedly fail). Thus, we propose adopting a policy of responding to points of philosophical and practical uncertainty associated with the alignment problem by limiting and choosing necessary assumptions to reduce the risk of false positives. Herein we explore in detail two relevant points of uncertainty that AGI alignment research hinges on---meta-ethical uncertainty and uncertainty about mental phenomena---and show how to reduce false positives in response to them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Computability, Logic, AI Algorithms
