CAP: Controllable Alignment Prompting for Unlearning in LLMs

Zhaokun Wang; Jinyu Guo; Jingwen Pu; Hongli Pu; Meng Yang; Xunlei Chen; Jie Ou; Wenyi Li; Guangchun Luo; Wenhong Tian

arXiv:2604.21251·cs.LG·May 18, 2026

CAP: Controllable Alignment Prompting for Unlearning in LLMs

Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian

PDF

TL;DR

The paper introduces CAP, a prompt-based framework enabling controllable, reversible unlearning of specific knowledge in large language models without modifying their parameters.

Contribution

CAP offers a novel, end-to-end prompt optimization approach using reinforcement learning for targeted knowledge unlearning in LLMs, overcoming previous limitations.

Findings

01

CAP achieves precise, controllable unlearning without parameter updates.

02

The framework enables reversible knowledge restoration via prompt revocation.

03

Experiments show CAP outperforms prior methods in unlearning accuracy and control.

Abstract

Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.