Rejecting Hallucinated State Targets during Planning
Mingde Zhao, Tristan Sylvain, Romain Laroche, Doina Precup, Yoshua Bengio

TL;DR
This paper introduces a method to identify and reject infeasible, hallucinated targets in planning agents, reducing delusional behaviors and improving performance without altering the original agent or its generator.
Contribution
It proposes a target feasibility evaluator trained with a novel combination of techniques to robustly filter infeasible targets during planning.
Findings
Significant reduction in delusional behaviors
Performance improvements across various agents
Effective identification of infeasible targets
Abstract
In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "targets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent's interactions with the environment and the targets produced by its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDeception detection and forensic psychology · Criminal Justice and Corrections Analysis · Psychopathy, Forensic Psychiatry, Sexual Offending
