Mediating Artificial Intelligence Developments through Negative and Positive Incentives
The Anh Han, Luis Moniz Pereira, Tom Lenaerts, Francisco C. Santos

TL;DR
This paper explores how positive and negative incentives can influence AI development, aiming to promote safety and prevent unsafe rapid progress through strategic regulatory measures.
Contribution
It provides a theoretical analysis of how rewards and punishments can shape AI race outcomes, highlighting conditions where incentives improve safety without hindering innovation.
Findings
Punishments can slow unsafe AI development but may cause over-regulation.
Rewards can accelerate safe AI development without over-regulation.
In certain scenarios, rewards effectively promote safety and innovation.
Abstract
The field of Artificial Intelligence (AI) is going through a period of great expectations, introducing a certain level of anxiety in research, business and also policy. This anxiety is further energised by an AI race narrative that makes people believe they might be missing out. Whether real or not, a belief in this narrative may be detrimental as some stake-holders will feel obliged to cut corners on safety precautions, or ignore societal consequences just to "win". Starting from a baseline model that describes a broad class of technology races where winners draw a significant benefit compared to others (such as AI advances, patent race, pharmaceutical technologies), we investigate here how positive (rewards) and negative (punishments) incentives may beneficially influence the outcomes. We uncover conditions in which punishment is either capable of reducing the development speed of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
