$\aleph$-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection

Nitay Alon; Joseph M. Barnby; Stefan Sarkadi; Lion Schulz; Jeffrey S. Rosenschein; Peter Dayan

arXiv:2405.01870·cs.MA·March 5, 2026

$\aleph$-IPOMDP: Mitigating Deception in a Cognitive Hierarchy with Off-Policy Counterfactual Anomaly Detection

Nitay Alon, Joseph M. Barnby, Stefan Sarkadi, Lion Schulz, Jeffrey S. Rosenschein, Peter Dayan

PDF

TL;DR

This paper introduces the $\u2206$-IPOMDP framework, enhancing Bayesian RL agents with anomaly detection and out-of-belief policies to detect and deter deception in multi-agent interactions.

Contribution

It presents a novel computational framework that enables agents to recognize deception and respond credibly, addressing vulnerabilities in recursive opponent modeling.

Findings

01

Effective detection of deception in mixed-motive and zero-sum games

02

Leads to more equitable outcomes and reduces exploitation

03

Applicable to AI safety, cybersecurity, and cognitive science

Abstract

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper recursive capabilities. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework called $ℵ$ -IPOMDP, which augments the Bayesian inference of model-based RL agents with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize that they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and a zero-sum game. Our results demonstrate the $ℵ$ -mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.