GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection

Zhijie Deng; Chris Yuhao Liu; Zirui Pang; Xinlei He; Lei Feng; Qi Xuan; Zhaowei Zhu; Jiaheng Wei

arXiv:2505.13312·cs.CL·May 20, 2025

GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection

Zhijie Deng, Chris Yuhao Liu, Zirui Pang, Xinlei He, Lei Feng, Qi Xuan, Zhaowei Zhu, Jiaheng Wei

PDF

Open Access

TL;DR

GUARD introduces a novel inference-time unlearning method for LLMs that dynamically restricts generation to prevent forgetting specific knowledge, maintaining model performance while enhancing safety and compliance.

Contribution

This work presents GUARD, a framework for real-time unlearning during inference that avoids fine-tuning, using adaptive restriction and detection to selectively prevent generation of forgotten content.

Findings

01

Effective unlearning on multiple datasets

02

Minimal impact on overall model performance

03

Strong trade-off between forgetting and utility

Abstract

Large Language Models (LLMs) have demonstrated strong capabilities in memorizing vast amounts of knowledge across diverse domains. However, the ability to selectively forget specific knowledge is critical for ensuring the safety and compliance of deployed models. Existing unlearning efforts typically fine-tune the model with resources such as forget data, retain data, and a calibration model. These additional gradient steps blur the decision boundary between forget and retain knowledge, making unlearning often at the expense of overall performance. To avoid the negative impact of fine-tuning, it would be better to unlearn solely at inference time by safely guarding the model against generating responses related to the forget target, without destroying the fluency of text generation. In this work, we propose Generation-time Unlearning via Adaptive Restriction and Detection (GUARD), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education

MethodsTofu