ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Wai Man Si, Michael Backes, Yang Zhang

TL;DR
This paper introduces ICLGuard, a fine-tuning framework that enables LLM owners to control and restrict in-context learning behavior on specific data, enhancing content regulation without compromising overall model performance.
Contribution
The paper proposes ICLGuard, a novel fine-tuning method that selectively deactivates ICL capabilities on targeted data while preserving general functionality.
Findings
ICLGuard effectively deactivates ICL on specific data.
It minimally fine-tunes parameters, preserving original model performance.
The approach maintains ICL ability on non-target data.
Abstract
In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Privacy, Security, and Data Protection · Privacy-Preserving Technologies in Data
MethodsSparse Evolutionary Training
