Safeguarding the safeguards: How best to promote AI alignment in the public interest
Oliver Guest, Michael Aird, Se\'an \'O h\'Eigeartaigh

TL;DR
This paper discusses how public institutions can effectively support AI alignment efforts to reduce risks of accidents and misuse, emphasizing systematic approaches and mitigation of potential counterproductive issues.
Contribution
It identifies four problems that can hinder AI alignment efforts and proposes mitigations, offering a systematic framework for institutions to enhance effectiveness.
Findings
Support for alignment should focus on reducing accident and misuse risks.
Four key problems can undermine alignment efforts and require mitigation.
Systematic planning increases the likelihood of beneficial AI safety outcomes.
Abstract
AI alignment work is important from both a commercial and a safety lens. With this paper, we aim to help actors who support alignment efforts to make these efforts as effective as possible, and to avoid potential adverse effects. We begin by suggesting that institutions that are trying to act in the public interest (such as governments) should aim to support specifically alignment work that reduces accident or misuse risks. We then describe four problems which might cause alignment efforts to be counterproductive, increasing large-scale AI risks. We suggest mitigations for each problem. Finally, we make a broader recommendation that institutions trying to act in the public interest should think systematically about how to make their alignment efforts as effective, and as likely to be beneficial, as possible.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property · Ethics and Social Impacts of AI
