Robustness Verification for Transformers
Zhouxing Shi, Huan Zhang, Kai-Wei Chang, Minlie Huang, Cho-Jui Hsieh

TL;DR
This paper introduces the first robustness verification algorithm for Transformer models, addressing their complex self-attention mechanisms and providing tighter safety bounds than previous methods, with implications for interpretability.
Contribution
Develops the first robustness verification algorithm for Transformers, overcoming challenges posed by self-attention layers and cross-position dependencies.
Findings
Certified robustness bounds are significantly tighter than naive methods.
Bounds help interpret Transformer importance in sentiment analysis.
Method demonstrates effectiveness on complex Transformer architectures.
Abstract
Robustness verification that aims to formally certify the prediction behavior of neural networks has become an important tool for understanding model behavior and obtaining safety guarantees. However, previous methods can usually only handle neural networks with relatively simple architectures. In this paper, we consider the robustness verification problem for Transformers. Transformers have complex self-attention layers that pose many challenges for verification, including cross-nonlinearity and cross-position dependency, which have not been discussed in previous works. We resolve these challenges and develop the first robustness verification algorithm for Transformers. The certified robustness bounds computed by our method are significantly tighter than those by naive Interval Bound Propagation. These bounds also shed light on interpreting Transformers as they consistently reflect the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
