Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers
Patrik Zavoral, Du\v{s}an Vari\v{s}, Ond\v{r}ej Bojar

TL;DR
This paper investigates how Transformer models overfit sequence length, revealing that they generalize well to shorter sequences but struggle with longer ones, often favoring structural cues over algorithmic understanding.
Contribution
It introduces a method using elementary string edit functions and error indicators to interpret Transformer overfitting related to sequence length and structure.
Findings
Transformers overfit to sequence length, especially longer sequences.
Models often prefer structural cues over algorithmic correctness.
Partial correctness is common despite overfitting issues.
Abstract
The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsDropout · Layer Normalization · Adam · Attention Is All You Need · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
