Post-Training Overfitting Mitigation in DNN Classifiers
Hang Wang, David J. Miller, George Kesidis

TL;DR
This paper introduces a post-training activation thresholding method that effectively reduces overfitting in deep neural networks caused by class imbalance, overtraining, and backdoor data-poisoning, improving generalization and security.
Contribution
It proposes a novel activation thresholding technique based on maximum margins that enhances post-training mitigation of both malicious backdoors and non-malicious overfitting in DNNs.
Findings
Activation thresholding improves backdoor mitigation.
Method reduces overfitting due to class imbalance.
Strong performance demonstrated on CIFAR datasets.
Abstract
Well-known (non-malicious) sources of overfitting in deep neural net (DNN) classifiers include: i) large class imbalances; ii) insufficient training-set diversity; and iii) over-training. In recent work, it was shown that backdoor data-poisoning also induces overfitting, with unusually large classification margins to the attacker's target class, mediated particularly by (unbounded) ReLU activations that allow large signals to propagate in the DNN. Thus, an effective post-training (with no knowledge of the training set or training process) mitigation approach against backdoors was proposed, leveraging a small clean dataset, based on bounding neural activations. Improving upon that work, we threshold activations specifically to limit maximum margins (MMs), which yields performance gains in backdoor mitigation. We also provide some analytical support for this mitigation approach. Most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
