$\aleph$-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection
Nitay Alon, Joseph M. Barnby, Stefan Sarkadi, Lion Schulz, Jeffrey S. Rosenschein, Peter Dayan

TL;DR
This paper introduces the $\u2206$-IPOMDP framework, enhancing Bayesian RL agents with anomaly detection and out-of-belief policies to detect and deter deception in multi-agent interactions.
Contribution
It presents a novel computational framework that enables agents to recognize deception and respond credibly, addressing vulnerabilities in recursive opponent modeling.
Findings
Effective detection of deception in mixed-motive and zero-sum games
Leads to more equitable outcomes and reduces exploitation
Applicable to AI safety, cybersecurity, and cognitive science
Abstract
Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper recursive capabilities. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework called -IPOMDP, which augments the Bayesian inference of model-based RL agents with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize that they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and a zero-sum game. Our results demonstrate the -mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
