Inference-Time Machine Unlearning via Gated Activation Redirection

Vin\'icius Conte Turani; Ot\'avio Parraga; Jo\~ao Vitor Boer Abitante; Kristen K. Arguello; Joana Pasquali; Ramiro N. Barros; Flavio du Pin Calmon; Christian Mattjie; Rodrigo C. Barros; Lucas S. Kupssinsk\"u

arXiv:2605.12765·cs.LG·May 19, 2026

Inference-Time Machine Unlearning via Gated Activation Redirection

Vin\'icius Conte Turani, Ot\'avio Parraga, Jo\~ao Vitor Boer Abitante, Kristen K. Arguello, Joana Pasquali, Ramiro N. Barros, Flavio du Pin Calmon, Christian Mattjie, Rodrigo C. Barros, Lucas S. Kupssinsk\"u

PDF

TL;DR

GUARD-IT is a novel inference-time unlearning method that uses input-dependent activation steering to remove memorized data from large language models without altering weights, ensuring privacy and safety.

Contribution

It introduces a gradient-free, input-dependent activation redirection technique that unlearns data at inference time, outperforming gradient-based methods in utility and safety.

Findings

01

Matches or exceeds 12 gradient-based baselines across model scales.

02

Preserves utility, suppresses memorization, and avoids collapse.

03

Remains effective under quantization, unlike parameter-editing methods.

Abstract

Large Language Models memorize vast amounts of training data, raising concerns regarding privacy, copyright infringement, and safety. Machine unlearning seeks to remove the influence of a targeted forget set while preserving model performance, ideally approximating a model retrained from scratch without the forget set. Existing approaches aim to achieve this by updating model parameters via gradient-based methods. However, these updates are computationally expensive, lead to irreversible weight changes, and degrade when the model is quantized for deployment. A recent alternative to changing model weights is activation engineering, where activations are changed during inference to steer model behavior. Despite circumventing weight editing, naive activation steering introduces its own failure modes, as a single global steering vector applies the same intervention to every input, leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.