Reservoir Transformers

Sheng Shen; Alexei Baevski; Ari S. Morcos; Kurt Keutzer; Michael Auli,; Douwe Kiela

arXiv:2012.15045·cs.CL·June 3, 2021

Reservoir Transformers

Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli,, Douwe Kiela

PDF

Open Access

TL;DR

This paper shows that transformers can perform well even with some layers randomly initialized and fixed, leading to faster training and improved performance on language tasks.

Contribution

It introduces the concept of reservoir layers within transformers, combining fixed random layers with trainable ones for efficiency and effectiveness.

Findings

01

Improved training speed due to reservoir layers.

02

Enhanced performance on language modeling tasks.

03

Reservoir layers maintain effectiveness despite random initialization.

Abstract

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear "reservoir" layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Machine Learning and ELM