Adversarial Training for Defense Against Label Poisoning Attacks

Melis Ilayda Bal; Volkan Cevher; Michael Muehlebach

arXiv:2502.17121·cs.LG·February 25, 2025

Adversarial Training for Defense Against Label Poisoning Attacks

Melis Ilayda Bal, Volkan Cevher, Michael Muehlebach

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces FLORAL, an adversarial training method using SVMs within a bilevel optimization framework, to defend against label poisoning attacks that compromise model integrity.

Contribution

The paper presents a novel SVM-based adversarial training strategy with theoretical convergence analysis for defending against label poisoning attacks.

Findings

01

FLORAL outperforms robust baselines in various classification tasks.

02

It achieves higher robust accuracy under increasing attacker budgets.

03

Effective across diverse model architectures.

Abstract

As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks. These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications. In this paper, we propose FLORAL, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an attacker, who strategically poisons critical training labels, and the model, which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 2

Strengths

- The notation in this paper is generally clear, making the formula easy to follow. - The plot illustration is clear. - The theoretical proof and analysis is clear and promising.

Weaknesses

- The experiments are limited. The datasets are relatively small. - The author claims that FLORAL can be integrated with neural network but I did not find it. The proof here is all based on SVM and I suspect the proof for neural network, especially for the non-convex neural network, will be different. If the framework can only work and guarantee on SVM-based classifier, the contribution will be decreased a lot. - Since it is a general method, the vision dataset, even MNIST, should be given. -

Reviewer 02Rating 3Confidence 3

Strengths

1. The paper is well-presented, with clear and high-quality writing. 2. The figures are visually appealing. 3. The FLORAL achieves SOTA performance compared to the selected baselines.

Weaknesses

1. All baselines examined in this paper were published before 2021. To support the claim of achieving state-of-the-art performance, the authors may consider including comparisons with more recent methods. 2. Have the authors considered placing the figures and tables at the top of the current page? 3. The authors mention using ``SVM with an RBF kernel, which serves as a basic benchmark (Hearst et al., 1998).'' Have more recent approaches been considered as benchmarks? 4. Could the authors add a c

Reviewer 03Rating 6Confidence 3

Strengths

Thanks to the authors for submitting this interesting work. The proposal of the paper is for sure original and novel since the problem of defending an ML model against a label flip poisoning attack is cast as an adversarial training problem for the first time. The specific formulation of the problem as a non-zero sum Stackelberg is also original and enables the design of a defense algorithm that outperforms existing defenses. Moreover, the theoretical analysis behind the convergence to the optim

Weaknesses

A key weakness of the paper lies in the lack of clarity about the motivation for using adversarial training to counter poisoning attacks and for formulating the problem as a non-zero-sum Stackelberg game. Adversarial training, particularly in the context of evasion attacks on ML models, typically aims to train the model on possible attacks to make it robust against test time attacks, as explained in Sections 1 and 3.3 in this paper. Instead, defenses against poisoning attacks often focus on remo

Code & Models

Repositories

melisilaydabal/floral
pytorchOfficial

Videos

Adversarial Training for Defense Against Label Poisoning Attacks· slideslive

Taxonomy

TopicsAnalytical Methods in Pharmaceuticals · Pharmacovigilance and Adverse Drug Reactions · Pesticide Residue Analysis and Safety

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Dropout · Weight Decay · Linear Layer · Layer Normalization · WordPiece · Dense Connections