The poison of dimensionality
L\^e-Nguy\^en Hoang

TL;DR
This paper investigates how increasing model size in machine learning can heighten vulnerability to poisoning attacks, revealing a fundamental tradeoff between model expressivity and security, supported by theoretical proofs and experiments.
Contribution
It provides a theoretical analysis showing that larger models with many parameters are more susceptible to poisoning, and empirically demonstrates this tradeoff on various datasets.
Findings
Models with over 169H^2/P^2 parameters are vulnerable to poisoning.
A tradeoff exists between model complexity and attack surface.
Experimental results confirm increased vulnerability with larger models.
Abstract
This paper advances the understanding of how the size of a machine learning model affects its vulnerability to poisoning, despite state-of-the-art defenses. Given isotropic random honest feature vectors and the geometric median (or clipped mean) as the robust gradient aggregator rule, we essentially prove that, perhaps surprisingly, linear and logistic regressions with parameters are subject to arbitrary model manipulation by poisoners, where and are the numbers of honestly labeled and poisoned data points used for training. Our experiments go on exposing a fundamental tradeoff between augmenting model expressivity and increasing the poisoners' attack surface, on both synthetic data, and on MNIST & FashionMNIST data for linear classifiers with random features. We also discuss potential implications for source-based learning and neural nets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Explainable Artificial Intelligence (XAI)
