Homeostasis and Sparsity in Transformer
Leonid Kotyuzanskiy, Artem Klimov

TL;DR
This paper introduces homeostasis-inspired sparsity mechanisms into transformer architectures, improving their performance on language translation tasks by dynamically adjusting activation distributions.
Contribution
It proposes novel homeostasis and sparsity techniques, such as RFB-kWTA and Smart Inhibition, integrated into transformers to enhance their efficiency and accuracy.
Findings
Achieved 0.3062 BLEU score on Multi30K dataset.
Outperformed classical transformer and dropout-only models.
Demonstrated effectiveness of homeostasis mechanisms in transformers.
Abstract
The transformer architecture has become an integral part of the field of modern neural networks, playing a crucial role in a variety of tasks, such as text generation, machine translation, image and audio processing, among others. There is also an alternative approach to building intelligent systems, proposed by Jeff Hawkins and inspired by the processes occurring in the neocortex. In our article we want to combine some of these ideas and to propose the use of homeostasis mechanisms, such as RFB-kWTA and "Smart" Inhibition, in the attention mechanism of the transformer and at the output of the transformer block, as well as conducting an experiment involving the introduction of sparse distributed representations of the transformer at various points. RFB-kWTA utilizes statistics of layer activations across time to adjust the entire layer, enhancing the values of rare activations while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · Dropout
