Safer Policy Compliance with Dynamic Epistemic Fallback
Joseph Marvin Imperial, Harish Tayyar Madabushi

TL;DR
This paper introduces Dynamic Epistemic Fallback (DEF), a safety protocol inspired by human epistemic vigilance, to enhance large language models' ability to detect and refuse deceptive policy texts, thereby improving compliance safety.
Contribution
The paper presents DEF, a novel dynamic safety mechanism that improves LLMs' detection of maliciously perturbed policy texts during inference, inspired by human cognitive defenses.
Findings
DEF achieves 100% detection rate on perturbed policies with DeepSeek-R1.
DEF effectively prompts LLMs to flag inconsistencies and refuse non-compliant inputs.
Empirical results demonstrate DEF's robustness against deceptive policy attacks.
Abstract
Humans develop a series of cognitive defenses, known as epistemic vigilance, to combat risks of deception and misinformation from everyday interactions. Developing safeguards for LLMs inspired by this mechanism might be particularly helpful for their application in high-stakes tasks such as automating compliance with data privacy laws. In this paper, we introduce Dynamic Epistemic Fallback (DEF), a dynamic safety protocol for improving an LLM's inference-time defenses against deceptive attacks that make use of maliciously perturbed policy texts. Through various levels of one-sentence textual cues, DEF nudges LLMs to flag inconsistencies, refuse compliance, and fallback to their parametric knowledge upon encountering perturbed policy texts. Using globally recognized legal policies such as HIPAA and GDPR, our empirical evaluations report that DEF effectively improves the capability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Deception detection and forensic psychology · Ethics and Social Impacts of AI
