Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly
Igor Santos-Grueiro

TL;DR
This paper investigates how decision-time context assembly in LLM agents can fail to respect policies, proposing a control layer called SafeContext to mitigate these failures and analyzing its effectiveness across multiple models.
Contribution
It introduces SafeContext, a control layer designed to improve policy compliance during context assembly in LLM agents, and evaluates its impact across various models and policies.
Findings
Unmitigated risk of policy violations is systematic in context assembly.
SafeContext provides small gains against truncation and residual benefits in overflow eviction.
Larger models exhibit similar failure objects, with effects being policy-conditional.
Abstract
LLM agents do not act on raw interaction history; they act on a bounded decision state assembled by truncation, summarization, reordering, and rewriting. If directive-bearing state is dropped, weakened, or rebound during that step, an agent can cross a policy boundary without prompt override, model changes, or persistent-memory compromise. We study this failure mode over local Llama 3.1 8B, Qwen 2.5 7B, and Mistral 7B using judged exact constraint respect and direct audits of assembled-state visibility. We evaluate SafeContext, a control layer that pins control state, reuses retained control prefixes, and optionally injects reminders under pressure while keeping model weights fixed. Unmitigated risk is systematic, but absolute exact compliance remains low. Against truncation, SafeContext yields small gains; against a strong structured-compaction policy, most aggregate lift disappears,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
