Perturbation-Induced Linearization: Constructing Unlearnable Data with Solely Linear Classifiers
Jinlin Liu, Wei Chen, Xiaojin Zhang

TL;DR
This paper introduces PIL, a fast and efficient method for creating unlearnable data by inducing linearization in deep models using only linear surrogates, reducing computational costs significantly.
Contribution
The paper presents PIL, a novel linearization-based approach for generating unlearnable examples that is computationally efficient and effective compared to existing neural network-based methods.
Findings
PIL achieves comparable or better performance than neural network-based methods.
PIL dramatically reduces computational time for generating unlearnable data.
Inducing linearization is a key mechanism behind the effectiveness of unlearnable examples.
Abstract
Collecting web data to train deep models has become increasingly common, raising concerns about unauthorized data usage. To mitigate this issue, unlearnable examples introduce imperceptible perturbations into data, preventing models from learning effectively. However, existing methods typically rely on deep neural networks as surrogate models for perturbation generation, resulting in significant computational costs. In this work, we propose Perturbation-Induced Linearization (PIL), a computationally efficient yet effective method that generates perturbations using only linear surrogate models. PIL achieves comparable or better performance than existing surrogate-based methods while reducing computational time dramatically. We further reveal a key mechanism underlying unlearnable examples: inducing linearization to deep models, which explains why PIL can achieve competitive results in a…
Peer Reviews
Decision·ICLR 2026 Poster
1.The paper provides a comprehensive theoretical analysis of why unlearnable examples work. 2.PIL’s use of a linear classifier as the generator is useful and computationally lightweight. 3.The method can be applied to a broad range of models, which shows the strong generalization. 4.Both experiments and theoretical analysis are comprehensive and clear.
1.The author's claim that PIL perturbations are ‘imperceptible’ to humans is not quantified by human evaluation results or statistical measurement. 2.The authors empirically show that gradients from unlearnable samples are nearly orthogonal to those from clean samples, implying training interference. However, this is discussed qualitatively. A more rigorous analysis could include “quantitative measures such as cosine similarity distributions between gradient vectors per class.”
1. Simplicity. PIL only use only linear surrogate models. On CIFAR-10, the reported generation time is under one GPU-minute. 2. Efficiency. PIL remains effective under various data augmentation strategies and adversarial training.
1. Please include results for larger initial clean ratios (η), e.g., 0.8, to validate how perturbed samples contribute to accuracy improvements. 2. This paper primarily focuses on methods developed up to 2022. It would be helpful to include comparisons with more recent work—such as CUDA [1] and UGEs [2] in effectiveness and generation time. [1] Vinu Sankar Sadasivan, Mahdi Soltanolkotabi, and Soheil Feizi. Cuda: Convolution-based unlearnable datasets. In Proceedings of the IEEE/CVF Conference
In my batch so far, this is the most well-written paper. The writing/math is very clear, and some sections are better than prior conference work which introduce unlearnable example methods (especially wrt the defense-attacker definition and motivation of the problem). I can tell the authors were careful in how they designed notation to explain their method. It was a fun read. There are a number of novel contributions: 1. a simple (2 loss components optimized with SGD) loss function that optim
1. For common defenses (Section 4.2.1), ISS [1] is likely the most important, reasonable, and cheap defense against unlearnable examples. I am particularly interested in an evaluation of *just ISS* (different JPEG compression qualities should be tried: 0.9, 0.8, 0.7, etc) instead of all the other "defenses" (cutout, cutmix, mixup, etc.) because JPEG has been shown to be so effective but the augmentations provided in Table 2 are a good start (and can still be in appendix). The ISS [1] paper broke
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Topic Modeling
