Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents
Yeran Gamage

TL;DR
This study reveals that in long conversations, omission-based safety constraints in LLM agents weaken over time, while commission constraints remain effective, highlighting a divergence in constraint decay.
Contribution
It introduces the concept of Security-Recall Divergence (SRD) and demonstrates how omission constraints decay while commission constraints persist in long-context LLM interactions.
Findings
Omission compliance drops from 73% to 33% over 16 turns.
Commission compliance remains at 100% throughout.
Re-injecting constraints before Safe Turn Depth restores compliance.
Abstract
LLM agents deployed in production operate under operator-defined behavioral policies (system-prompt instructions such as prohibitions on credential disclosure, data exfiltration, and unauthorized output) that safety evaluations assume hold throughout a conversation. Prohibition-type constraints decay under context pressure while requirement-type constraints persist; we term this asymmetry Security-Recall Divergence (SRD). In a 4,416-trial three-arm causal study across 12 models and 8 providers at six conversation depths, omission compliance falls from 73% at turn 5 to 33% at turn 16 while commission compliance holds at 100% (Mistral Large 3, ). In the two models with token-matched padding controls, schema semantic content accounts for 62-100% of the dilution effect. Re-injecting constraints before the per-model Safe Turn Depth (STD) restores compliance without retraining.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
