Are Transformers More Robust? Towards Exact Robustness Verification for Transformers
Brian Hsuan-Cheng Liao, Chih-Hong Cheng, Hasan Esen, Alois Knoll

TL;DR
This paper investigates the robustness of Sparsemax-based Transformers, framing the verification as an MIQCP problem, and compares their robustness to MLPs in safety-critical applications, revealing that Transformers are not inherently more robust.
Contribution
It introduces a novel MIQCP-based method for exact robustness verification of Transformers and proposes heuristics to improve computational efficiency.
Findings
Transformers are not necessarily more robust than MLPs.
The MIQCP approach enables exact robustness verification.
Heuristics significantly speed up the verification process.
Abstract
As an emerging type of Neural Networks (NNs), Transformers are used in many domains ranging from Natural Language Processing to Autonomous Driving. In this paper, we study the robustness problem of Transformers, a key characteristic as low robustness may cause safety concerns. Specifically, we focus on Sparsemax-based Transformers and reduce the finding of their maximum robustness to a Mixed Integer Quadratically Constrained Programming (MIQCP) problem. We also design two pre-processing heuristics that can be embedded in the MIQCP encoding and substantially accelerate its solving. We then conduct experiments using the application of Land Departure Warning to compare the robustness of Sparsemax-based Transformers against that of the more conventional Multi-Layer-Perceptron (MLP) NNs. To our surprise, Transformers are not necessarily more robust, leading to profound considerations in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
MethodsSoftmax · Layer Normalization · Sparsemax
