Inference-Time Machine Unlearning via Gated Activation Redirection
Vin\'icius Conte Turani, Ot\'avio Parraga, Jo\~ao Vitor Boer Abitante, Kristen K. Arguello, Joana Pasquali, Ramiro N. Barros, Flavio du Pin Calmon, Christian Mattjie, Rodrigo C. Barros, Lucas S. Kupssinsk\"u

TL;DR
GUARD-IT is a novel inference-time unlearning method that uses input-dependent activation steering to remove memorized data from large language models without altering weights, ensuring privacy and safety.
Contribution
It introduces a gradient-free, input-dependent activation redirection technique that unlearns data at inference time, outperforming gradient-based methods in utility and safety.
Findings
Matches or exceeds 12 gradient-based baselines across model scales.
Preserves utility, suppresses memorization, and avoids collapse.
Remains effective under quantization, unlike parameter-editing methods.
Abstract
Large Language Models memorize vast amounts of training data, raising concerns regarding privacy, copyright infringement, and safety. Machine unlearning seeks to remove the influence of a targeted forget set while preserving model performance, ideally approximating a model retrained from scratch without the forget set. Existing approaches aim to achieve this by updating model parameters via gradient-based methods. However, these updates are computationally expensive, lead to irreversible weight changes, and degrade when the model is quantized for deployment. A recent alternative to changing model weights is activation engineering, where activations are changed during inference to steer model behavior. Despite circumventing weight editing, naive activation steering introduces its own failure modes, as a single global steering vector applies the same intervention to every input, leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
