What is my math transformer doing? -- Three results on interpretability and generalization
Fran\c{c}ois Charton

TL;DR
This paper explores the interpretability and generalization of math transformers trained on matrix problems, revealing they often retain mathematical properties even when failing and can generalize beyond training data with proper dataset choices.
Contribution
It demonstrates that math transformers preserve mathematical properties in failures and that dataset design influences training speed and out-of-distribution generalization.
Findings
Incorrect predictions often retain mathematical properties
Failures can be predicted from problem properties
Careful dataset choice improves training and generalization
Abstract
This paper investigates the failure cases and out-of-distribution behavior of transformers trained on matrix inversion and eigenvalue decomposition. I show that incorrect model predictions still retain deep mathematical properties of the solution (e.g. correct eigenvalues, unit norm of eigenvectors), and that almost all model failures can be attributed to, and predicted from, properties of the problem or solution. This demonstrates that, when in doubt, math transformers do not hallucinate absurd solutions (as was sometimes proposed) but remain ``roughly right''. I also show that the careful choice of a training dataset can accelerate training, while allowing the model to generalize out of its training distribution, invalidating the idea that transformers ``merely interpolate'' from memorized examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Machine Learning and Data Classification
