Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Shota Takashiro, Takeshi Kojima, Andrew Gambardella, Qi Cao, Yusuke Iwasawa, Yutaka Matsuo

TL;DR
This paper introduces a method called in-context knowledge unlearning that allows large language models to selectively forget specific information at test time based on query context, improving privacy and security.
Contribution
It proposes a novel fine-tuning approach enabling LLMs to unlearn targeted knowledge dynamically, with detailed analysis of internal model behavior and layer-wise decision-making.
Findings
Achieves up to 95% forget accuracy
Retains 80% of unrelated knowledge
Outperforms existing baselines in various scenarios
Abstract
As large language models (LLMs) are applied across diverse domains, the ability to selectively unlearn specific information is becoming increasingly essential. For instance, LLMs are expected to selectively provide confidential information to authorized internal users, such as employees or trusted partners, while withholding it from external users, including the general public and unauthorized entities. Therefore, we propose a novel method termed ``in-context knowledge unlearning'', which enables the model to selectively forget information in test-time based on the query context. Our method fine-tunes pre-trained LLMs to enable prompt unlearning of target knowledge within the context, while preserving unrelated information. Experiments on TOFU, AGE and RWKU datasets using Llama2-7B/13B and Mistral-7B models demonstrate that our method achieves up to 95% forget accuracy while retaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsTofu
