Adversarial Testing as a Tool for Interpretability: Length-based   Overfitting of Elementary Functions in Transformers

Patrik Zavoral; Du\v{s}an Vari\v{s}; Ond\v{r}ej Bojar

arXiv:2410.13802·cs.LG·October 18, 2024

Adversarial Testing as a Tool for Interpretability: Length-based Overfitting of Elementary Functions in Transformers

Patrik Zavoral, Du\v{s}an Vari\v{s}, Ond\v{r}ej Bojar

PDF

Open Access

TL;DR

This paper investigates how Transformer models overfit sequence length, revealing that they generalize well to shorter sequences but struggle with longer ones, often favoring structural cues over algorithmic understanding.

Contribution

It introduces a method using elementary string edit functions and error indicators to interpret Transformer overfitting related to sequence length and structure.

Findings

01

Transformers overfit to sequence length, especially longer sequences.

02

Models often prefer structural cues over algorithmic correctness.

03

Partial correctness is common despite overfitting issues.

Abstract

The Transformer model has a tendency to overfit various aspects of the training data, such as the overall sequence length. We study elementary string edit functions using a defined set of error indicators to interpret the behaviour of the sequence-to-sequence Transformer. We show that generalization to shorter sequences is often possible, but confirm that longer sequences are highly problematic, although partially correct answers are often obtained. Additionally, we find that other structural characteristics of the sequences, such as subsegment length, may be equally important. We hypothesize that the models learn algorithmic aspects of the tasks simultaneously with structural aspects but adhering to the structural aspects is unfortunately often preferred by Transformer when they come into conflict.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsDropout · Layer Normalization · Adam · Attention Is All You Need · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings