Keeping up with dynamic attackers: Certifying robustness to adaptive   online data poisoning

Avinandan Bose; Laurent Lessard; Maryam Fazel; Krishnamurthy Dj; Dvijotham

arXiv:2502.16737·cs.LG·February 25, 2025

Keeping up with dynamic attackers: Certifying robustness to adaptive online data poisoning

Avinandan Bose, Laurent Lessard, Maryam Fazel, Krishnamurthy Dj, Dvijotham

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new framework for certifying the robustness of learning algorithms against adaptive, online data poisoning attacks, addressing a gap in existing static adversary models, with initial applications to mean estimation and binary classification.

Contribution

It proposes a novel method to compute certified bounds on the impact of dynamic poisoning attacks and designs robust learning algorithms based on these certificates.

Findings

01

Framework for certifying robustness against adaptive attacks

02

Application to mean estimation and binary classification

03

Open directions for extending the approach

Abstract

The rise of foundation models fine-tuned on human feedback from potentially untrusted users has increased the risk of adversarial data poisoning, necessitating the study of robustness of learning algorithms against such attacks. Existing research on provable certified robustness against data poisoning attacks primarily focuses on certifying robustness for static adversaries who modify a fraction of the dataset used to train the model before the training algorithm is applied. In practice, particularly when learning from human feedback in an online sense, adversaries can observe and react to the learning process and inject poisoned samples that optimize adversarial objectives better than when they are restricted to poisoning a static dataset once, before the learning algorithm is applied. Indeed, it has been shown in prior work that online dynamic adversaries can be significantly more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avinandan22/certified-robustness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Advanced Malware Detection Techniques · Network Security and Intrusion Detection