ICLGuard: Controlling In-Context Learning Behavior for Applicability   Authorization

Wai Man Si; Michael Backes; Yang Zhang

arXiv:2407.06955·cs.CR·July 10, 2024

ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

Wai Man Si, Michael Backes, Yang Zhang

PDF

Open Access

TL;DR

This paper introduces ICLGuard, a fine-tuning framework that enables LLM owners to control and restrict in-context learning behavior on specific data, enhancing content regulation without compromising overall model performance.

Contribution

The paper proposes ICLGuard, a novel fine-tuning method that selectively deactivates ICL capabilities on targeted data while preserving general functionality.

Findings

01

ICLGuard effectively deactivates ICL on specific data.

02

It minimally fine-tunes parameters, preserving original model performance.

03

The approach maintains ICL ability on non-target data.

Abstract

In-context learning (ICL) is a recent advancement in the capabilities of large language models (LLMs). This feature allows users to perform a new task without updating the model. Concretely, users can address tasks during the inference time by conditioning on a few input-label pair demonstrations along with the test input. It is different than the conventional fine-tuning paradigm and offers more flexibility. However, this capability also introduces potential issues. For example, users may use the model on any data without restriction, such as performing tasks with improper or sensitive content, which might violate the model policy or conflict with the model owner's interests. As a model owner, it is crucial to establish a mechanism to control the model's behavior under ICL, depending on the model owner's requirements for various content. To this end, we introduce the concept of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Privacy, Security, and Data Protection · Privacy-Preserving Technologies in Data

MethodsSparse Evolutionary Training