The poison of dimensionality

L\^e-Nguy\^en Hoang

arXiv:2409.17328·cs.LG·September 27, 2024

The poison of dimensionality

L\^e-Nguy\^en Hoang

PDF

Open Access

TL;DR

This paper investigates how increasing model size in machine learning can heighten vulnerability to poisoning attacks, revealing a fundamental tradeoff between model expressivity and security, supported by theoretical proofs and experiments.

Contribution

It provides a theoretical analysis showing that larger models with many parameters are more susceptible to poisoning, and empirically demonstrates this tradeoff on various datasets.

Findings

01

Models with over 169H^2/P^2 parameters are vulnerable to poisoning.

02

A tradeoff exists between model complexity and attack surface.

03

Experimental results confirm increased vulnerability with larger models.

Abstract

This paper advances the understanding of how the size of a machine learning model affects its vulnerability to poisoning, despite state-of-the-art defenses. Given isotropic random honest feature vectors and the geometric median (or clipped mean) as the robust gradient aggregator rule, we essentially prove that, perhaps surprisingly, linear and logistic regressions with $D \geq 169 H^{2} / P^{2}$ parameters are subject to arbitrary model manipulation by poisoners, where $H$ and $P$ are the numbers of honestly labeled and poisoned data points used for training. Our experiments go on exposing a fundamental tradeoff between augmenting model expressivity and increasing the poisoners' attack surface, on both synthetic data, and on MNIST & FashionMNIST data for linear classifiers with random features. We also discuss potential implications for source-based learning and neural nets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Explainable Artificial Intelligence (XAI)