Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Stefan Vasilev; Christian Herold; Baohao Liao; Seyyed Hadi Hashemi; Shahram Khadivi; Christof Monz

arXiv:2505.06027·cs.CL·May 12, 2025

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Stefan Vasilev, Christian Herold, Baohao Liao, Seyyed Hadi Hashemi, Shahram Khadivi, Christof Monz

PDF

Open Access 1 Video

TL;DR

Unilogit is a new self-distillation technique for large language models that enables effective machine unlearning by dynamically adjusting target logits, improving privacy compliance without sacrificing model utility.

Contribution

It introduces a hyperparameter-free, dynamic target adjustment method for machine unlearning in LLMs, outperforming existing approaches in balancing forgetting and retention.

Findings

01

Unilogit outperforms state-of-the-art methods like NPO and UnDIAL.

02

It demonstrates robustness across different datasets and scenarios.

03

The method effectively balances model utility and data privacy requirements.

Abstract

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation· underline

Taxonomy

TopicsMachine Learning and Data Classification · Privacy-Preserving Technologies in Data · Data Quality and Management