Systematic Testing of the Data-Poisoning Robustness of KNN

Yannan Li; Jingbo Wang; and Chao Wang

arXiv:2307.08288·cs.SE·July 18, 2023

Systematic Testing of the Data-Poisoning Robustness of KNN

Yannan Li, Jingbo Wang, and Chao Wang

PDF

Open Access

TL;DR

This paper introduces a systematic testing approach for assessing the data-poisoning robustness of KNN, capable of both certifying robustness and falsifying non-robust cases more efficiently than existing methods.

Contribution

It presents a novel over-approximate analysis combined with systematic testing to improve accuracy and speed in verifying KNN's data-poisoning robustness.

Findings

01

Outperforms baseline enumeration in speed and accuracy

02

Can certify robustness for most test inputs

03

Effectively falsifies non-robust cases

Abstract

Data poisoning aims to compromise a machine learning based software component by contaminating its training set to change its prediction results for test inputs. Existing methods for deciding data-poisoning robustness have either poor accuracy or long running time and, more importantly, they can only certify some of the truly-robust cases, but remain inconclusive when certification fails. In other words, they cannot falsify the truly-non-robust cases. To overcome this limitation, we propose a systematic testing based method, which can falsify as well as certify data-poisoning robustness for a widely used supervised-learning technique named k-nearest neighbors (KNN). Our method is faster and more accurate than the baseline enumeration method, due to a novel over-approximate analysis in the abstract domain, to quickly narrow down the search space, and systematic testing in the concrete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Imbalanced Data Classification Techniques