FireBERT: Hardening BERT-based classifiers against adversarial attack

Gunnar Mein; Kevin Hartman; Andrew Morris

arXiv:2008.04203·cs.CL·August 11, 2020

FireBERT: Hardening BERT-based classifiers against adversarial attack

Gunnar Mein, Kevin Hartman, Andrew Morris

PDF

Open Access 1 Repo

TL;DR

FireBERT introduces three methods to enhance BERT classifiers' robustness against adversarial word-perturbation attacks, maintaining high accuracy on both regular and adversarial samples through co-tuning and evaluation-time perturbations.

Contribution

The paper proposes novel techniques for hardening BERT-based classifiers against adversarial attacks, including co-tuning with synthetic data and evaluation-time perturbations, demonstrating significant robustness improvements.

Findings

01

Co-tuning with synthetic data protects against 95% of adversarial samples.

02

Evaluation-time perturbation restores up to 75% of original accuracy under attack.

03

Methods maintain high accuracy on regular benchmark samples.

Abstract

We present FireBERT, a set of three proof-of-concept NLP classifiers hardened against TextFooler-style word-perturbation by producing diverse alternatives to original samples. In one approach, we co-tune BERT against the training data and synthetic adversarial samples. In a second approach, we generate the synthetic samples at evaluation time through substitution of words and perturbation of embedding vectors. The diversified evaluation results are then combined by voting. A third approach replaces evaluation-time word substitution with perturbation of embedding vectors. We evaluate FireBERT for MNLI and IMDB Movie Review datasets, in the original and on adversarial examples generated by TextFooler. We also test whether TextFooler is less successful in creating new adversarial samples when manipulating FireBERT, compared to working on unhardened classifiers. We show that it is possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FireBERT-author/FireBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection

MethodsLinear Layer · WordPiece · Dense Connections · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Multi-Head Attention · Dropout · Adam