Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

Yisheng Zhong; Sijia Liu; Zhuangdi Zhu

arXiv:2604.15482·cs.LG·April 20, 2026

Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation

Yisheng Zhong, Sijia Liu, Zhuangdi Zhu

PDF

TL;DR

This paper introduces a multi-objective unlearning framework for Large Language Models that harmonizes various goals like removing hazardous info, preserving utility, and ensuring robustness using domain standardization and bidirectional distillation.

Contribution

It proposes a novel approach combining unified domain representation and bidirectional logit distillation to improve multi-objective LLM unlearning.

Findings

01

Achieves state-of-the-art performance in multi-objective unlearning tasks.

02

Effectively aligns domain distributions to reduce task interference.

03

Enhances robustness against adversarial probing attacks.

Abstract

Large Language Models (LLMs) unlearning is crucial for removing hazardous or privacy-leaking information from the model. Practical LLM unlearning demands satisfying multiple challenging objectives simultaneously: removing undesirable knowledge, preserving general utility, avoiding over-refusal of neighboring concepts, and, crucially, ensuring robustness against adversarial probing attacks. However, existing unlearning methods primarily focus on a limited subset of these goals, typically unlearning efficacy and utility preservation while overlooking robustness and boundary behaviors. Naively extending these methods to multi-objective settings may lead to unlearning task interference. We propose a novel multi-objective unlearning framework that harmonizes multiple unlearning objectives through a data and optimization co-design: We standardize training corpora into a unified data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.