The AI off-switch problem as a signalling game: bounded rationality and incomparability
Alessio Benavoli, Alessandro Facchini, Marco Zaffalon

TL;DR
This paper models the AI off-switch problem as a signalling game involving bounded rationality, showing that AI systems tend to preserve the off-switch when uncertain about human utility, with insights on message costs and incomparability.
Contribution
It introduces a signalling game framework for the off-switch problem considering bounded rationality and extends analysis to incomparability scenarios, providing new theoretical insights.
Findings
Uncertainty about human utility encourages AI to preserve the off-switch.
Message costs influence the AI's strategic communication.
Incomparability scenarios affect the signalling strategies.
Abstract
The off-switch problem is a critical challenge in AI control: if an AI system resists being switched off, it poses a significant risk. In this paper, we model the off-switch problem as a signalling game, where a human decision-maker communicates its preferences about some underlying decision problem to an AI agent, which then selects actions to maximise the human's utility. We assume that the human is a bounded rational agent and explore various bounded rationality mechanisms. Using real machine learning models, we reprove prior results and demonstrate that a necessary condition for an AI system to refrain from disabling its off-switch is its uncertainty about the human's utility. We also analyse how message costs influence optimal strategies and extend the analysis to scenarios involving incomparability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Research in Systems and Signal Processing · Blockchain Technology Applications and Security · Economic and Technological Systems Analysis
