Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

Ziwen Liu; Huawei Lin; Yide Ran; Denghui Zhang; Jianwen Xie; Chuan Li; Weijie Zhao; and Zhaozhuo Xu

arXiv:2604.16591·cs.LG·April 21, 2026

Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

Ziwen Liu, Huawei Lin, Yide Ran, Denghui Zhang, Jianwen Xie, Chuan Li, Weijie Zhao, and Zhaozhuo Xu

PDF

1 Video

TL;DR

This paper introduces RASLIK, a scalable randomized retrieval algorithm that improves data unlearning in large language models by balancing forgetting and retention more effectively.

Contribution

It proposes RASLIK, a novel randomized antipodal search method that enhances retrieval for unlearning, outperforming deterministic baselines and oracle sampling.

Findings

01

RASLIK reduces selection variance and achieves sublinear complexity.

02

RASLIK consistently outperforms deterministic baselines and oracle sampling.

03

The method improves the trade-off between forgetting and retention in LLM unlearning.

Abstract

Large language models (LLMs) sometimes memorize undesirable knowledge, which must be removed after deployment. Prior work on machine unlearning has focused largely on optimization methods that adjust parameters to enforce forgetting while preserving retention. However, these approaches assume that the forget and retain sets are readily available, which rarely holds in practice. Unlearning is typically triggered by an undesired generation at inference time, making the retrieval of relevant data the central challenge. We introduce the notion of data Pareto improvement for LLM unlearning, which formalizes how retrieval can expand the achievable trade-off frontier between forgetting and retention. To realize this principle, we propose Randomized Antipodal Search on Linearized Influence Kernel (RASLIK), a retrieval algorithm that combines permutation-projection hashing with randomized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning· slideslive