Reservoir Transformers
Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli,, Douwe Kiela

TL;DR
This paper shows that transformers can perform well even with some layers randomly initialized and fixed, leading to faster training and improved performance on language tasks.
Contribution
It introduces the concept of reservoir layers within transformers, combining fixed random layers with trainable ones for efficiency and effectiveness.
Findings
Improved training speed due to reservoir layers.
Enhanced performance on language modeling tasks.
Reservoir layers maintain effectiveness despite random initialization.
Abstract
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Machine Learning and ELM
