Safer Policy Compliance with Dynamic Epistemic Fallback

Joseph Marvin Imperial; Harish Tayyar Madabushi

arXiv:2601.23094·cs.CL·February 2, 2026

Safer Policy Compliance with Dynamic Epistemic Fallback

Joseph Marvin Imperial, Harish Tayyar Madabushi

PDF

Open Access

TL;DR

This paper introduces Dynamic Epistemic Fallback (DEF), a safety protocol inspired by human epistemic vigilance, to enhance large language models' ability to detect and refuse deceptive policy texts, thereby improving compliance safety.

Contribution

The paper presents DEF, a novel dynamic safety mechanism that improves LLMs' detection of maliciously perturbed policy texts during inference, inspired by human cognitive defenses.

Findings

01

DEF achieves 100% detection rate on perturbed policies with DeepSeek-R1.

02

DEF effectively prompts LLMs to flag inconsistencies and refuse non-compliant inputs.

03

Empirical results demonstrate DEF's robustness against deceptive policy attacks.

Abstract

Humans develop a series of cognitive defenses, known as epistemic vigilance, to combat risks of deception and misinformation from everyday interactions. Developing safeguards for LLMs inspired by this mechanism might be particularly helpful for their application in high-stakes tasks such as automating compliance with data privacy laws. In this paper, we introduce Dynamic Epistemic Fallback (DEF), a dynamic safety protocol for improving an LLM's inference-time defenses against deceptive attacks that make use of maliciously perturbed policy texts. Through various levels of one-sentence textual cues, DEF nudges LLMs to flag inconsistencies, refuse compliance, and fallback to their parametric knowledge upon encountering perturbed policy texts. Using globally recognized legal policies such as HIPAA and GDPR, our empirical evaluations report that DEF effectively improves the capability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Deception detection and forensic psychology · Ethics and Social Impacts of AI