Transformers learn through gradual rank increase

Enric Boix-Adsera; Etai Littwin; Emmanuel Abbe; Samy Bengio; Joshua; Susskind

arXiv:2306.07042·cs.LG·December 12, 2023·5 cites

Transformers learn through gradual rank increase

Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua, Susskind

PDF

Open Access

TL;DR

This paper investigates how transformers learn gradually by increasing the rank of weight differences, providing theoretical proof under simplified conditions and empirical evidence in real-world scenarios.

Contribution

It introduces a rigorous analysis of incremental learning dynamics in transformers, highlighting rank increase as a key aspect of their training process.

Findings

01

Rank difference between trained and initial weights increases during training

02

Theoretical proof under diagonal weights and small initialization

03

Empirical evidence shows the phenomenon occurs in practice

Abstract

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and small initialization. Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Blind Source Separation Techniques · Machine Learning and ELM