What Can We Learn from Unlearnable Datasets?

Pedro Sandoval-Segura; Vasu Singla; Jonas Geiping; Micah Goldblum; Tom; Goldstein

arXiv:2305.19254·cs.LG·November 9, 2023·1 cites

What Can We Learn from Unlearnable Datasets?

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom, Goldstein

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically examines unlearnable datasets used for data privacy, revealing that neural networks can still learn useful features and that certain assumptions about their limitations are incorrect, challenging their effectiveness.

Contribution

It provides empirical evidence that neural networks can learn useful features from unlearnable datasets and introduces a simpler attack method to bypass protections.

Findings

01

Neural networks can learn useful features from unlearnable datasets.

02

Linear separability of perturbations is not necessary for unlearnability.

03

An orthogonal projection attack effectively learns from protected datasets.

Abstract

In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

psandovalsegura/learn-from-unlearnable
pytorchOfficial

Videos

What Can We Learn from Unlearnable Datasets?· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting