WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

Mohammad M Maheri; Xavier Cadet; Peter Chin; Hamed Haddadi

arXiv:2512.00272·cs.LG·March 4, 2026

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

Mohammad M Maheri, Xavier Cadet, Peter Chin, Hamed Haddadi

PDF

Open Access 3 Reviews

TL;DR

This paper introduces WARP, a teleportation-based method that enhances privacy in machine unlearning by reducing gradient energy and parameter proximity, thereby mitigating membership inference and reconstruction attacks.

Contribution

WARP is a novel, plug-and-play reparameterization technique that improves privacy in approximate unlearning without sacrificing model accuracy.

Findings

01

WARP reduces attack success by up to 92% in white-box settings.

02

It maintains model accuracy while enhancing privacy.

03

Effective across six unlearning algorithms.

Abstract

Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. Connecting neural network symmetry with unlearning privacy is a novel conceptual contribution. 2. The evaluation uses strong, adaptive attacks (U-LiRA, and a custom white-box attack), making the defense's evaluation robust.

Weaknesses

1. Significant Computational Overhead is the most significant drawback. The core of this method (see Algorithm 3) requires computing the SVD for the retainer activation matrix $R_l$ of each layer in each transport step. SVD is a computationally intensive operation. The analysis in Appendix J (Figure 8) shows that this adds an average of +27% to the runtime overhead. 2. This method introduces a large number of new and seemingly highly sensitive hyperparameters: transmission step size $\eta_{tel}$

Reviewer 02Rating 2Confidence 4

Strengths

1. WARP can be applied to both CNNs and ViTs, showing that it does not rely on any particular neural network architecture. 2. WARP is designed to be a plug-in method and can be adopted with many unlearning methods, as demonstrated in the experiments. 3. The improvements of WARP are consistent across all the tasks and unlearning methods.

Weaknesses

1. Each layer’s retain subspace is built from a randomly sampled retain minibatch, where this small batch can misrepresent the full retain set when the retain set is highly diverse. This could result in an inaccurate retain subspace and hinder defense effectiveness. 2. The paper describes teleportation as “leaving the task loss unchanged up to numerical error”, which is in fact an approximation of a loss-invariant transformation on the retain set. Yet there is no analytical worst-case bound on t

Reviewer 03Rating 6Confidence 3

Strengths

1. The proposed algorithm uses inherent network symmetries to randomize parameter space without retraining or noise injection. It is easy to integrate into any unlearning pipeline, introduces negligible computational costs, and preserves model accuracy. 2. The authors evaluated their algorithm across multiple datasets, unlearning algorithms, and attack types. The inclusion of detailed ablations on transformation modes and frequencies supports the robustness and generality of the proposed defen

Weaknesses

1. The described process involves per-layer SVDs and null-space projections, which can be expensive for larger models. 2. The work does not provide a theoretical explanation of how teleportation changes the information relationship between parameters and training data. A formal analysis would make the contribution more rigorous. 3. Symmetries are by design invertible. If an attacker can recreate or approximate the teleportation transform (which they probably can given the strong threat model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications