When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Zhihao Li; Gezheng Xu; Jiale Cai; Ruiyi Fang; Di Wu; Qicheng Lao; Charles Ling; Boyu Wang

arXiv:2603.04731·cs.LG·March 6, 2026

When Priors Backfire: On the Vulnerability of Unlearnable Examples to Pretraining

Zhihao Li, Gezheng Xu, Jiale Cai, Ruiyi Fang, Di Wu, Qicheng Lao, Charles Ling, Boyu Wang

PDF

Open Access 3 Reviews

TL;DR

This paper reveals that unlearnable examples can be bypassed when models are pretrained, and introduces BAIT, a bi-level optimization method that enforces mislabel-perturbation binding to preserve data unlearnability.

Contribution

The paper uncovers a vulnerability of unlearnable examples with pretrained models and proposes BAIT, a novel method to maintain unlearnability by disrupting semantic learning.

Findings

01

Pretraining priors can nullify unlearnability by capturing genuine features.

02

BAIT effectively prevents models from learning true semantics from protected data.

03

Experiments show BAIT maintains unlearnability across benchmarks and backbones.

Abstract

Unlearnable Examples (UEs) serve as a data protection strategy that generates imperceptible perturbations to mislead models into learning spurious correlations instead of underlying semantics. In this paper, we uncover a fundamental vulnerability of UEs that emerges when learning starts from a pretrained model. Crucially, our empirical analysis shows that even when data are protected by carefully crafted perturbations, pretraining priors still furnish rich semantic representations that allow the model to circumvent the shortcuts introduced by UEs and capture genuine features, thereby nullifying unlearnability. To address this, we propose BAIT (Binding Artificial perturbations to Incorrect Targets), a novel bi-level optimization formulation. Specifically, the inner level aims at associating the perturbed samples with real labels to simulate standard data-label alignment, while the outer…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The proposed strategy for generating perturbations with changed target effectively attack the learning process of a pretrained model. Some analysis on parameter updates provide insights on difference between scratched model and pretrained model. Adequate ablations on datasets and model backbones to show the generalizability. The comparison is made on both randomly initialized and pretrained surrogate model, both showing good improvement with previous methods

Weaknesses

No major weaknesses in this paper. See the questions part for minors.

Reviewer 02Rating 4Confidence 5

Strengths

Based on Figure 1 (a) it appears that the proposed class-wise perturbations reduce the test accuracy of both pretrained and trained-from-scratch models, whereas other unlearnable example methods only reduce test accuracy for train-from-scratch models. I also agree that for unlearnable examples the case of utilizing pretrained weights is much more likely (and a more realistic scenario) than train-from-scratch, so unlearnable examples must be able to hold up to this approach (please see weaknesses

Weaknesses

1. The "core innovation" (L180-181) is an error-minimizing noise, which is explored in [2]. Technically, the original error-minimizing noise of [2] also "binds perturbations to designated incorrect labels that are semantically different from the ground truth, deliberately steering learning away from genuine semantics." (L157-158) because [2] uses a pretrained surrogate model. By using a pretrained surrogate, the perturbations are like features of the pretrained model. 2. I think Eq. 1 does not

Reviewer 03Rating 4Confidence 4

Strengths

1. Despite some discussions in previous research, there are few solutions before to address the failure of UEs on pre-trained models. 2. The method description, experiment design, and the results seem convincing. 3. Despite some formatting issues, the writing and presentation are generally good.

Weaknesses

1. The findings highlighted at the beginning of the paper that current UEs fail on pre-trained models do not seem to be that surprising, as this issue has been expressed in many previous studies. 2. From a practical perspective, the key to solving this problem, in my opinion, is to fine-tune a larger model using the additional data collected. Although this experiment will be more difficult, the authors should try to discuss the practical significance of this issue. 3. The paper doesn't seem to p

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning