Invasive Context Engineering to Control Large Language Models
Thomas Rivasseau

TL;DR
This paper introduces invasive context engineering, a novel method for controlling large language models by inserting control sentences into their context, enhancing security and robustness without retraining the models.
Contribution
The paper proposes invasive context engineering as a new approach to improve LLM security in long-context scenarios without requiring additional training.
Findings
Invasive context engineering effectively controls LLM behavior.
It enhances security against adversarial and jailbreak attacks.
The method is applicable to Chain-of-Thought processes.
Abstract
Current research on operator control of Large Language Models improves model robustness against adversarial attacks and misbehavior by training on preference examples, prompting, and input/output filtering. Despite good results, LLMs remain susceptible to abuse, and jailbreak probability increases with context length. There is a need for robust LLM security guarantees in long-context situations. We propose control sentences inserted into the LLM context as invasive context engineering to partially solve the problem. We suggest this technique can be generalized to the Chain-of-Thought process to prevent scheming. Invasive Context Engineering does not rely on LLM training, avoiding data shortage pitfalls which arise in training models for long context situations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
