A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness
Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni

TL;DR
This paper demonstrates that a one-layer decoder-only Transformer functions as a two-layer RNN and introduces ARC-Tran, a scalable method for verifying the robustness of such models against arbitrary perturbations, outperforming existing techniques.
Contribution
It reveals the equivalence between one-layer Transformers and two-layer RNNs and proposes ARC-Tran, a novel scalable verification method for robustness against arbitrary perturbations.
Findings
ARC-Tran trains more robust models than existing methods.
ARC-Tran achieves high certification accuracy.
Addresses limitations of previous robustness verification techniques.
Abstract
This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Photonic Communication Systems · Optical Network Technologies · Radio Frequency Integrated Circuit Design
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
