RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance
Miao Lin, Feng Yu, Rui Ning, Lusi Li, Jiawei Chen, Qian Lou, Mengxin Zheng, Chunsheng Xin, Hongyi Wu

TL;DR
This paper introduces RPP, a black-box detection framework that effectively identifies backdoor samples in imbalanced datasets, outperforming existing defenses and providing theoretical guarantees.
Contribution
It is the first to analyze how dataset imbalance amplifies backdoor vulnerabilities and proposes RPP, a certified detection method with provable guarantees using only model output probabilities.
Findings
RPP outperforms state-of-the-art defenses in detection accuracy.
Imbalanced datasets significantly increase backdoor vulnerability.
RPP provides provable guarantees and operates in a black-box setting.
Abstract
Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios that can amplify backdoor threats. This paper presents the first in-depth investigation of how the dataset imbalance amplifies backdoor vulnerability, showing that (i) the imbalance induces a majority-class bias that increases susceptibility and (ii) conventional defenses degrade significantly as the imbalance grows. To address this, we propose Randomized Probability Perturbation (RPP), a certified poisoned-sample detection framework that operates in a black-box setting using only model output probabilities. For any inspected sample, RPP determines whether the input has been backdoor-manipulated, while offering provable within-domain detectability guarantees and a probabilistic upper bound on the false…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Originality: The paper presents the first work to systematically investigate and formalize the critical problem of how dataset imbalance amplifies backdoor vulnerability and cripples existing defenses. The paper proposes Randomized Probability Perturbation(RPP), which adapts the concept of randomized smoothing to the distinct task of certified poisoned sample detection, proposing a paradigm shift from distribution-level signals used by prior defenses to a sample-level robustness metr
1. The paper's sample-level approach is commendable, but its global calibration of RPP thresholds remains vulnerable to distributional skew from class imbalance. A critical analysis of per-class detection performance and adaptive thresholding strategies is needed. 2. The practicality of requiring a pre-trained preliminary model on potentially poisoned data needs clearer justification regarding security and computational overhead. The exact deployment scenario and cost-benefit analysis compared t
1.The paper uncovers an important and previously underexplored phenomenon — that class imbalance can substantially amplify a model’s susceptibility to backdoor attacks. This observation provides a valuable perspective on how data distribution affects model security. 2.The proposed detection approach abandons conventional clustering- or distribution-based strategies, instead introducing a per-sample robustness criterion based on output stability under random perturbations. This idea is conceptua
1.Lack of validation on real-world long-tailed datasets: All imbalance settings in the experiments are synthetically generated. The absence of evaluation on naturally imbalanced or long-tailed datasets limits the practical generalizability of the results. Adding experiments on real datasets would significantly strengthen the paper’s claims. 2.Inconsistent notation: Some symbols and notations are inconsistently used across sections, which slightly reduces readability. A careful revision to ensu
The paper provides a thorough analysis of how dataset imbalance amplifies backdoor vulnerabilities and weakens existing defenses, which is a practical and underexplored aspect of backdoor research. The work offers formal guarantees, including conditions for detectability, upper and lower trigger bounds, and provable control over false positive rate. Evaluations on five benchmarks and ten attack types demonstrate strong performance, outperforming 11 state-of-the-art defenses in both detection a
The theoretical analysis assumes Gaussian perturbations and independent noise across samples, which may not hold for complex data distributions. Although adaptive scenarios are mentioned, the empirical defense-attack interplay is underexplored. More thorough experiments on adaptive trigger shaping or low-magnitude attacks would add credibility to claims of robustness.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
